Change Log¶
All notable changes to this project will be documented in this file. This project adheres to Semantic Versioning starting with version 0.7.0.
[0.14.2] - 2018-01-29¶
Added¶
rasa_nlu.evaluatenow exports reports into a folder and also includes the entity extractor reports
Changed¶
- updated requirements to match Core and SDK
 - pinned keras dependecies
 
[0.14.0] - 2018-01-23¶
Added¶
- Ability to save successful predictions and classification results to a JSON
file from 
rasa_nlu.evaluate - environment variables specified with 
${env_variable}in a yaml configuration file are now replaced with the value of the environment variable - more documentation on how to run NLU with Docker
 analyzerparameter tointent_featurizer_count_vectorsfeaturizer to configure whether to use word or character n-grams- Travis script now clones and tests the Rasa NLU starter pack
 
Changed¶
EmbeddingIntentClassifierhas been refactored, including changes to the config parameters as well as comments and types for all class functions.- the http server’s 
POST /evaluateendpoint returns evaluation results for both entities and intents - replaced 
yamlwithruamel.yaml - updated spacy version to 2.0.18
 - updated TensorFlow version to 1.12.0
 - updated scikit-learn version to 0.20.2
 - updated cloudpickle version to 0.6.1
 
Removed¶
/configendpoint- removed pinning of 
msgpackand unused packagepython-msgpack - removed support for 
ner_duckling. Now supports onlyner_duckling_http 
Fixed¶
- Should loading jieba custom dictionaries only once.
 - Set attributes of custom components correctly if they defer from the default
 - NLU Server can now handle training data mit emojis in it
 - If the 
token_nameis not given in the endpoint configuration, the default value istokeninstead of ``None` - Throws error only if 
ner_crfpicks up overlapping entities. If the entity extractor supports overlapping entitis no error is thrown. - Updated CORS support for the server.
Added the 
Access-Control-Allow-HeadersandContent-Typeheaders for nlu server - parsing of emojis which are sent within jsons
 - Bad input shape error from 
sklearn_intent_classifierwhen usingscikit-learn==0.20.2 
[0.13.8] - 2018-11-21¶
Fixed¶
- pinned spacy version to 
spacy<=2.0.12,>2.0to avoid dependency conflicts with tensorflow 
[0.13.7] - 2018-10-11`¶
Fixed¶
rasa_nlu.serverallowed more thanmax_training_processesto be trained if they belong to different projects.max_training_processesis now a global parameter, regardless of what project the training process belongs to.
[0.13.6] - 2018-10-04¶
Changed¶
boto3is now loaded lazily inAWSPersistorand is not included inrequirements_bare.txtanymore
Fixed¶
- Allow training of pipelines containing 
EmbeddingIntentClassifierin a separate thread on python 3. This makes http server calls to/trainnon-blocking - require 
scikit-learn<0.20in setup py to avoid corrupted installations with the most recent scikit learn 
[0.13.5] - 2018-09-28¶
Changed¶
- Training data is now validated after loading from files in 
loading.pyinstead of on initialisation ofTrainingDataobject 
Fixed¶
Projectset up to pull models from a remote server only use the pulled model instead of searching for models locally
[0.13.4] - 2018-09-19¶
Fixed¶
- pinned matplotlib to 2.x (not ready for 3.0 yet)
 - pytest-services since it wasn’t used and caused issues on Windows
 
[0.13.3] - 2018-08-28¶
Added¶
EndpointConfigclass that handles authenticated requests (ported from Rasa Core)DataRouter()class supports amodel_serverEndpointConfig, which it regularly queries to fetch NLU models- this can be used with 
rasa_nlu.serverwith the--endpointoption (the key for this the model server config ismodel) - docs on model fetching from a URL
 - ability to specify lookup tables in training data
 
Changed¶
- loading training data from a URL requires an instance of 
EndpointConfig - Changed evaluate behaviour to plot two histogram bars per bin. Plotting confidence of right predictions in a wine-ish colour and wrong ones in a blue-ish colour.
 
Removed¶
Fixed¶
- re-added support for entity names with special characters in markdown format
 
[0.13.2] - 2018-08-28¶
Changed¶
- added information about migrating the CRF component from 0.12 to 0.13
 
Fixed¶
- pipelines containing the 
EmbeddingIntentClassifierare not trained in a 
separate thread, as this may lead to freezing during training
[0.13.1] - 2018-08-07¶
Added¶
- documentation example for creating a custom component
 
Fixed¶
- correctly pass reference time in miliseconds to duckling_http
 
[0.13.0] - 2018-08-02¶
Warning
This is a release breaking backwards compatibility. Unfortunately, it is not possible to load previously trained models as the parameters for the tensorflow and CRF models changed.
Added¶
- support for tokenizer_jieba load custom dictionary from config
 - allow pure json including pipeline configuration on train endpoint
 - doc link to a community contribution for Rasa NLU in Chinese
 - support for component 
count_vectors_featurizerusetokensfeature provide by tokenizer - 2-character and a 5-character prefix features to 
ner_crf ner_crfwith whitespaced tokens totensorflow_embeddingpipeline- predict empty string instead of None for intent name
 - update default parameters for tensorflow embedding classifier
 - do not predict anything if feature vector contains only zeros in tensorflow embedding classifier
 - change persistence keywords in tensorflow embedding classifier (make previously trained models impossible to load)
 - intent_featurizer_count_vectors adds features to text_features instead of overwriting them
 - add basic OOV support to intent_featurizer_count_vectors (make previously trained models impossible to load)
 - add a feature for each regex in the training set for crf_entity_extractor
 - Current training processes count for server and projects.
 - the 
/versionendpoint returns a new fieldminimum_compatible_version - added logging of intent prediction errors to evaluation script
 - added histogram of confidence scores to evaluation script
 - documentation for the 
ner_duckling_httpcomponent 
Changed¶
- renamed CRF features 
wordXtosuffixXandpreXtosuffixX - L1 and L2 regularisation defaults in 
ner_crfboth set to 0.1 whitespace_tokenizerignores punctuation.,!?before whitespace or end of string- Allow multiple training processes per project
 - Changed AlreadyTrainingError to MaxTrainingError. The first one was used to indicate that the project was already training. The latest will show an error when the server isn’t able to training more models.
 Interpreter.ensure_model_compatibilitytakes a new parameters for the version to compare the model version against- confusion matrix plot gets saved to file automatically during evaluation
 
Removed¶
- dependence on spaCy when training 
ner_crfwithout POS features - documentation for the 
ner_ducklingcomponent - facebook doesn’t maintain the underlying clojure version of duckling anymore. component will be removed in the next release. 
Fixed¶
- Fixed Luis emulation output to add start, end position and confidence for each entity.
 - Fixed byte encoding issue where training data could not be loaded by URL in python 3.
 
[0.12.3] - 2018-05-02¶
Added¶
- Returning used model name and project name in the response
of 
GET /parseandPOST /parseasmodelandprojectrespectively. 
Fixed¶
- readded possibility to set fixed model name from http train endpoint
 
[0.12.1] - 2018-04-18¶
Added¶
- support for retrieving training data from a URL
 
Fixed¶
- properly set duckling http url through environment setting
 - improvements and fixes to the configuration and pipeline documentation
 
[0.12.0] - 2018-04-17¶
Added¶
- support for inline entity synonyms in markdown training format
 - support for regex features in markdown training format
 - support for splitting and training data into multiple and mixing formats
 - support for markdown files containing regex-features or synonyms only
 - added ability to list projects in cloud storage services for model loading
 - server evaluation endpoint at 
POST /evaluate - server endpoint at 
DELETE /modelsto unload models from server memory - CRF entity recognizer now returns a confidence score when extracting entities
 - added count vector featurizer to create bag of words representation
 - added embedding intent classifier implemented in tensorflow
 - added tensorflow requirements
 - added docs blurb on handling contextual dialogue
 - distribute package as wheel file in addition to source distribution (faster install)
 - allow a component to specify which languages it supports
 - support for persisting models to Azure Storage
 - added tokenizer for CHINESE (
zh) as well as instructions on how to load MITIE model 
Changed¶
- model configuration is separated from server / train configuration. This is a breaking change and models need to be retrained. See migrations guide.
 - Regex features are now sorted internally. retrain your model if you use regex features
 - The keyword intent classifier now returns 
nullinstead of"None"as intent name in the json result if there’s no match - in teh evaluation results, replaced 
Owith the stringno_entityfor better understanding - The 
CRFEntityExtractornow only trains entity examples that have"extractor": "ner_crf"or no extractor at all - Ignore hidden files when listing projects or models
 - Docker Images now run on python 3.6 for better non-latin character set support
 - changed key name for a file in ngram featurizer
 - changed 
jsonObserverto generate logs without a record seperator - Improve jsonschema validation: text attribute of training data samples can not be empty
 - made the NLU server’s 
/evaluateendpoint asynchronous 
Fixed¶
- fixed certain command line arguments not getting passed into
the 
data_router 
[0.11.1] - 2018-02-02¶
Fixed¶
- Changelog doc formatting
 - fixed project loading for newly added projects to a running server
 - fixed certain command line arguments not getting passed into the data_router
 
[0.11.0] - 2018-01-30¶
Added¶
- non ascii character support for anything that gets json dumped (e.g. training data received over HTTP endpoint)
 - evaluation of entity extraction performance in 
evaluation.py - support for spacy 2.0
 - evaluation of intent classification with crossvalidation in 
evaluation.py - support for splitting training data into multiple files (markdown and JSON only)
 
Changed¶
- removed 
-e .from requirements files - if you want to install the app usepip install -e . - fixed http duckling parsing for non 
enlanguages - fixed parsing of entities from markdown training data files
 
[0.10.6] - 2018-01-02¶
Added¶
- support asterisk style annotation of examples in markdown format
 
Fixed¶
- Preventing capitalized entities from becoming synonyms of the form lower-cased -> capitalized
 
[0.10.5] - 2017-12-01¶
Fixed¶
- read token in server from config instead of data router
 - fixed reading of models with none date name prefix in server
 
[0.10.3] - 2017-10-26¶
Added¶
- support for new dialogflow data format (previously api.ai)
 - improved support for custom components (components are stored by class name in stored metadata to allow for components that are not mentioned in the Rasa NLU registry)
 - language option to convert script
 
Fixed¶
- Fixed loading of default model from S3. Fixes #633
 - fixed permanent training status when training fails #652
 - quick fix for None “_formatter_parser” bug
 
[0.10.0] - 2017-09-27¶
Added¶
- Support for training data in Markdown format
 - Cors support. You can now specify allowed cors origins within your configuration file.
 - The HTTP server is now backed by Klein (Twisted) instead of Flask. The server is now asynchronous but is no more WSGI compatible
 - Improved Docker automated builds
 - Rasa NLU now works with projects instead of models. A project can be the basis for a restaurant search bot in German or a customer service bot in English. A model can be seen as a snapshot of a project.
 
Changed¶
- Root project directories have been slightly rearranged to clean up new docker support
 - use 
Interpreter.create(metadata, ...)to create interpreter from dict andInterpreter.load(file_name, ...)to create interpreter with metadata from a file - Renamed 
nameparameter toproject - Docs hosted on GitHub pages now: Documentation
 - Adapted remote cloud storages to support projects (backwards incompatible!)
 
Fixed¶
- Fixed training data persistence. Fixes #510
 - Fixed UTF-8 character handling when training through HTTP interface
 - Invalid handling of numbers extracted from duckling during synonym handling. Fixes #517
 - Only log a warning (instead of throwing an exception) on misaligned entities during mitie NER
 
[0.9.1] - 2017-07-11¶
Fixed¶
- removed obsolete 
--outputparameter oftrain.py. use--pathinstead. fixes #473 
[0.9.0] - 2017-07-07¶
Added¶
- increased test coverage to avoid regressions (ongoing)
 - added regex featurization to support intent classification
and entity extraction (
intent_entity_featurizer_regex) 
Changed¶
- replaced existing CRF library (python-crfsuite) with sklearn-crfsuite (due to better windows support)
 - updated to spacy 1.8.2
 - logging format of logged request now includes model name and timestamp
 - use module specific loggers instead of default python root logger
 - output format of the duckling extractor changed. the 
valuefield now includes the complete value from duckling instead of just text (so this is an property is an object now instead of just text). includes granularity information now. - deprecated 
intent_examplesandentity_examplessections in training data. all examples should go into thecommon_examplessection - weight training samples based on class distribution during ner_crf cross validation and sklearn intent classification training
 - large refactoring of the internal training data structure and pipeline architecture
 - numpy is now a required dependency
 
Removed¶
- luis data tokenizer configuration value (not used anymore, luis exports char offsets now)
 
Fixed¶
- properly update coveralls coverage report from travis
 - persistence of duckling dimensions
 - changed default response of untrained 
intent_classifier_sklearnfrom"intent": Noneto"intent": {"name": None, "confidence": 0.0} /statusendpoint showing all available models instead of only those whose name starts with model- properly return training process ids #391
 
[0.8.8] - 2017-05-26¶
Fixed¶
/statusendpoint showing all available models instead of only those whose name starts with model
[0.8.0] - 2017-05-08¶
Added¶
- ngram character featurizer (allows better handling of out-of-vocab words)
 - replaced pre-wired backends with more flexible pipeline definitions
 - return top 10 intents with sklearn classifier #199
 - python type annotations for nearly all public functions
 - added alternative method of defining entity synonyms
 - support for arbitrary spacy language model names
 - duckling components to provide normalized output for structured entities
 - Conditional random field entity extraction (Markov model for entity tagging, better named entity recognition with low and medium data and similarly well at big data level)
 - allow naming of trained models instead of generated model names
 - dynamic check of requirements for the different components & error messages on missing dependencies
 - support for using multiple entity extractors and combining results downstream
 
Changed¶
unified tokenizers, classifiers and feature extractors to implement common component interface
srcdirectory renamed torasa_nluwhen loading data in a foreign format (api.ai, luis, wit) the data gets properly split into intent & entity examples
- Configuration:
 - added 
max_number_of_ngrams - removed 
backendand addedpipelineas a replacement - added 
luis_data_tokenizer - added 
duckling_dimensions 
- added 
 
- parser output format changed
 from
{"intent": "greeting", "confidence": 0.9, "entities": []}to
{"intent": {"name": "greeting", "confidence": 0.9}, "entities": []}
- entities output format changed
 from
{"start": 15, "end": 28, "value": "New York City", "entity": "GPE"}to
{"extractor": "ner_mitie", "processors": ["ner_synonyms"], "start": 15, "end": 28, "value": "New York City", "entity": "GPE"}where
extractordenotes the entity extractor that originally found an entity, andprocessordenotes components that alter entities, such as the synonym component.
camel cased MITIE classes (e.g.
MITIETokenizer→MitieTokenizer)model metadata changed, see migration guide
updated to spacy 1.7 and dropped training and loading capabilities for the spacy component (breaks existing spacy models!)
introduced compatibility with both Python 2 and 3
[0.7.4] - 2017-03-27¶
Fixed¶
- fixed failed loading of example data after renaming attributes, i.e. “KeyError: ‘entities’”
 
[0.7.3] - 2017-03-15¶
Fixed¶
- fixed regression in mitie entity extraction on special characters
 - fixed spacy fine tuning and entity recognition on passed language instance
 
[0.7.1] - 2017-03-10¶
[0.7.0] - 2017-03-10¶
This is a major version update. Please also have a look at the Migration Guide.
Added¶
- Changelog ;)
 - option to use multi-threading during classifier training
 - entity synonym support
 - proper temporary file creation during tests
 - mitie_sklearn backend using mitie tokenization and sklearn classification
 - option to fine-tune spacy NER models
 - multithreading support of build in REST server (e.g. using gunicorn)
 - multitenancy implementation to allow loading multiple models which share the same backend
 
Fixed¶
- error propagation on failed vector model loading (spacy)
 - escaping of special characters during mitie tokenization