Training and Policies¶
Training¶
Rasa Core works by creating training data from your stories and training a model on that data.
You can run training from the command line like in the Quickstart:
python -m rasa_core.train -d domain.yml -s data/stories.md \
-o models/current/dialogue -c config.yml
Or by creating an agent and running the train method yourself:
from rasa_core.agent import Agent
agent = Agent()
data = agent.load_data("stories.md")
agent.train(data)
Default configuration¶
By default, we try to provide you with a good set of configuration values and policies that suit most people. But you are encouraged to modify these to your needs:
policies:
- name: KerasPolicy
epochs: 100
max_history: 5
- name: FallbackPolicy
fallback_action_name: 'action_default_fallback'
- name: MemoizationPolicy
max_history: 5
- name: FormPolicy
- name: MappingPolicy
Data Augmentation¶
By default, Rasa Core will create longer stories by randomly glueing together the ones in your stories file. This is because if you have stories like:
# thanks
* thankyou
- utter_youarewelcome
# bye
* goodbye
- utter_goodbye
You actually want to teach your policy to ignore the dialogue history when it isn’t relevant and just respond with the same action no matter what happened before.
You can alter this behaviour with the --augmentation
flag.
--augmentation 0
disables this behavior.
In python, you can pass the augmentation_factor
argument to the
Agent.load_data
method.
Max History¶
One important hyperparameter for Rasa Core policies is the max_history
.
This controls how much dialogue history the model looks at to decide which
action to take next.
You can set the max_history
by passing it to your policy’s Featurizer
in the policy configuration yaml file.
Note
Only the MaxHistoryTrackerFeaturizer
uses a max history,
whereas the FullDialogueTrackerFeaturizer
always looks at
the full conversation history.
As an example, let’s say you have an out_of_scope
intent which
describes off-topic user messages. If your bot sees this intent multiple
times in a row, you might want to tell the user what you can help them
with. So your story might look like this:
* out_of_scope
- utter_default
* out_of_scope
- utter_default
* out_of_scope
- utter_help_message
For Rasa Core to learn this pattern, the max_history
has to be at least 3
.
If you increase your max_history
, your model will become bigger and
training will take longer. If you have some information that should
affect the dialogue very far into the future, you should store it as a
slot. Slot information is always available for every featurizer.
Training Script Options¶
/opt/python/3.5.6/lib/python3.5/runpy.py:125: RuntimeWarning: 'rasa_core.train' found in sys.modules after import of package 'rasa_core', but prior to execution of 'rasa_core.train'; this may result in unpredictable behaviour
warn(RuntimeWarning(msg))
usage: train.py default [-h] [--augmentation AUGMENTATION] [--dump_stories]
[--debug_plots] [-v] [-vv] [--quiet] [-c CONFIG] -o
OUT (-s STORIES | --url URL) -d DOMAIN
optional arguments:
-h, --help show this help message and exit
--augmentation AUGMENTATION
how much data augmentation to use during training
--dump_stories If enabled, save flattened stories to a file
--debug_plots If enabled, will create plots showing checkpoints and
their connections between story blocks in a file
called `story_blocks_connections.html`.
-c CONFIG, --config CONFIG
Policy specification yaml file.
-o OUT, --out OUT directory to persist the trained model in
-s STORIES, --stories STORIES
File or folder containing stories
--url URL If supplied, downloads a story file from a URL and
trains on it. Fetches the data by sending a GET
request to the supplied URL.
-d DOMAIN, --domain DOMAIN
Domain specification (yml file)
Python Logging Options:
-v, --verbose Be verbose. Sets logging level to INFO
-vv, --debug Print lots of debugging statements. Sets logging level
to DEBUG
--quiet Be quiet! Sets logging level to WARNING
Policies¶
The rasa_core.policies.Policy
class decides which action to take
at every step in the conversation.
There are different policies to choose from, and you can include
multiple policies in a single rasa_core.agent.Agent
. At
every turn, the policy which predicts the next action with the
highest confidence will be used.
Configuring polices using a configuration file¶
If you are using the training script, you must set the policies you would like the Core model to use in a YAML file.
For example:
policies:
- name: "KerasPolicy"
featurizer:
- name: MaxHistoryTrackerFeaturizer
max_history: 5
state_featurizer:
- name: BinarySingleStateFeaturizer
- name: "MemoizationPolicy"
max_history: 5
- name: "FallbackPolicy"
nlu_threshold: 0.4
core_threshold: 0.3
fallback_action_name: "my_fallback_action"
- name: "path.to.your.policy.class"
arg1: "..."
Pass the YAML file’s name to the train script using the --config
argument (or just -c
). There is a default config file you can use to
get started: Default configuration.
Note
Policies specified higher in the config.yaml
will take
precedence over a policy specified lower if the confidences
are equal.
Configuring polices in code¶
You can pass a list of policies when you create an agent:
from rasa_core.policies.memoization import MemoizationPolicy
from rasa_core.policies.keras_policy import KerasPolicy
from rasa_core.agent import Agent
agent = Agent("domain.yml",
policies=[MemoizationPolicy(), KerasPolicy()])
Memoization Policy¶
The MemoizationPolicy
just memorizes the conversations in your
training data. It predicts the next action with confidence 1.0
if this exact conversation exists in the training data, otherwise it
predicts None
with confidence 0.0
.
Keras Policy¶
The KerasPolicy
uses a neural network implemented in
Keras to select the next action.
The default architecture is based on an LSTM, but you can override the
KerasPolicy.model_architecture
method to implement your own architecture.
def model_architecture(
self,
input_shape: Tuple[int, int],
output_shape: Tuple[int, Optional[int]]
) -> tf.keras.models.Sequential:
"""Build a keras model and return a compiled model."""
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import (
Masking, LSTM, Dense, TimeDistributed, Activation)
# Build Model
model = Sequential()
# the shape of the y vector of the labels,
# determines which output from rnn will be used
# to calculate the loss
if len(output_shape) == 1:
# y is (num examples, num features) so
# only the last output from the rnn is used to
# calculate the loss
model.add(Masking(mask_value=-1, input_shape=input_shape))
model.add(LSTM(self.rnn_size, dropout=0.2))
model.add(Dense(input_dim=self.rnn_size, units=output_shape[-1]))
elif len(output_shape) == 2:
# y is (num examples, max_dialogue_len, num features) so
# all the outputs from the rnn are used to
# calculate the loss, therefore a sequence is returned and
# time distributed layer is used
# the first value in input_shape is max dialogue_len,
# it is set to None, to allow dynamic_rnn creation
# during prediction
model.add(Masking(mask_value=-1,
input_shape=(None, input_shape[1])))
model.add(LSTM(self.rnn_size, return_sequences=True, dropout=0.2))
model.add(TimeDistributed(Dense(units=output_shape[-1])))
else:
raise ValueError("Cannot construct the model because"
"length of output_shape = {} "
"should be 1 or 2."
"".format(len(output_shape)))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy',
optimizer='rmsprop',
metrics=['accuracy'])
logger.debug(model.summary())
return model
and the training is run here:
def train(self,
training_trackers: List[DialogueStateTracker],
domain: Domain,
**kwargs: Any
) -> None:
# set numpy random seed
np.random.seed(self.random_seed)
training_data = self.featurize_for_training(training_trackers,
domain,
**kwargs)
# noinspection PyPep8Naming
shuffled_X, shuffled_y = training_data.shuffled_X_y()
self.graph = tf.Graph()
with self.graph.as_default():
# set random seed in tf
tf.set_random_seed(self.random_seed)
self.session = tf.Session(config=self._tf_config)
with self.session.as_default():
if self.model is None:
self.model = self.model_architecture(shuffled_X.shape[1:],
shuffled_y.shape[1:])
logger.info("Fitting model with {} total samples and a "
"validation split of {}"
"".format(training_data.num_examples(),
self.validation_split))
# filter out kwargs that cannot be passed to fit
self._train_params = self._get_valid_params(
self.model.fit, **self._train_params)
self.model.fit(shuffled_X, shuffled_y,
epochs=self.epochs,
batch_size=self.batch_size,
shuffle=False,
**self._train_params)
# the default parameter for epochs in keras fit is 1
self.current_epoch = self.defaults.get("epochs", 1)
logger.info("Done fitting keras policy model")
You can implement the model of your choice by overriding these methods,
or initialize KerasPolicy
with pre-defined keras model
.
In order to get reproducible training results for the same inputs you can
set the random_seed
attribute of the KerasPolicy
to any integer.
Embedding Policy¶
The Recurrent Embedding Dialogue Policy (REDP) described in our paper: https://arxiv.org/abs/1811.11707
This policy has a pre-defined architecture, which comprises the following steps:
- apply dense layers to create embeddings for user intents, entities and system actions including previous actions and slots;
- use the embeddings of previous user inputs as a user memory and embeddings of previous system actions as a system memory;
- concatenate user input, previous system action and slots embeddings for current time into an input vector to rnn;
- using user and previous system action embeddings from the input vector, calculate attention probabilities over the user and system memories (for system memory, this policy uses NTM mechanism with attention by location);
- sum the user embedding and user attention vector and feed it and the embeddings of the slots as an input to an LSTM cell;
- apply a dense layer to the output of the LSTM to get a raw recurrent embedding of a dialogue;
- sum this raw recurrent embedding of a dialogue with system attention vector to create dialogue level embedding, this step allows the algorithm to repeat previous system action by copying its embedding vector directly to the current time output;
- weight previous LSTM states with system attention probabilities to get the previous action embedding, the policy is likely payed attention to;
- if the similarity between this previous action embedding and current time dialogue embedding is high, overwrite current LSTM state with the one from the time when this action happened;
- for each LSTM time step, calculate the similarity between the dialogue embedding and embedded system actions. This step is based on the starspace idea.
Note
This policy only works with
FullDialogueTrackerFeaturizer(state_featurizer)
.
It is recommended to use
state_featurizer=LabelTokenizerSingleStateFeaturizer(...)
(see Featurization for details).
Configuration:
Configuration parameters can be passed as parameters to the
EmbeddingPolicy
within the policy configuration file.Note
Pass an appropriate number of
epochs
to theEmbeddingPolicy
, otherwise the policy will be trained only for1
epoch. Since this is an embedding based policy, it requires a large number of epochs, which depends on the complexity of the training data and whether attention is used or not.The main feature of this policy is an attention mechanism over previous user input and system actions. Attention is turned on by default; in order to turn it off, configure the following parameters:
attn_before_rnn
iftrue
the algorithm will use attention mechanism over previous user input, defaulttrue
;attn_after_rnn
iftrue
the algorithm will use attention mechanism over previous system actions and will be able to copy previously executed action together with LSTM’s hidden state from its history, defaulttrue
;sparse_attention
iftrue
sparsemax
will be used instead ofsoftmax
for attention probabilities, defaultfalse
;attn_shift_range
the range of allowed location-based attention shifts for system memory (attn_after_rnn
), see https://arxiv.org/abs/1410.5401 for details;Note
Attention requires larger values of
epochs
and takes longer to train. But it can learn more complicated and nonlinear behaviour.The algorithm also has hyper-parameters to control:
neural network’s architecture:
hidden_layers_sizes_a
sets a list of hidden layers sizes before embedding layer for user inputs, the number of hidden layers is equal to the length of the list;hidden_layers_sizes_b
sets a list of hidden layers sizes before embedding layer for system actions, the number of hidden layers is equal to the length of the list;rnn_size
sets the number of units in the LSTM cell;training:
layer_norm
iftrue
layer normalization for lstm cell is turned on, defaulttrue
;batch_size
sets the number of training examples in one forward/backward pass, the higher the batch size, the more memory space you’ll need;epochs
sets the number of times the algorithm will see training data, where oneepoch
equals one forward pass and one backward pass of all the training examples;random_seed
if set to any int will get reproducible training results for the same inputs;embedding:
embed_dim
sets the dimension of embedding space;mu_pos
controls how similar the algorithm should try to make embedding vectors for correct intent labels;mu_neg
controls maximum negative similarity for incorrect intents;similarity_type
sets the type of the similarity, it should be eithercosine
orinner
;num_neg
sets the number of incorrect intent labels, the algorithm will minimize their similarity to the user input during training;use_max_sim_neg
iftrue
the algorithm only minimizes maximum similarity over incorrect intent labels;regularization:
C2
sets the scale of L2 regularizationC_emb
sets the scale of how important is to minimize the maximum similarity between embeddings of different intent labels;droprate_a
sets the dropout rate between hidden layers before embedding layer for user inputs;droprate_b
sets the dropout rate between hidden layers before embedding layer for system actions;droprate_rnn
sets the recurrent dropout rate on the LSTM hidden state https://arxiv.org/abs/1603.05118;train accuracy calculation:
evaluate_every_num_epochs
sets how often to calculate train accuracy, small values may hurt performance;evaluate_on_num_examples
how many examples to use for calculation of train accuracy, large values may hurt performance.Note
Droprate should be between
0
and1
, e.g.droprate=0.1
would drop out10%
of input units.Note
For
cosine
similaritymu_pos
andmu_neg
should be between-1
and1
.Note
There is an option to use linearly increasing batch size. The idea comes from https://arxiv.org/abs/1711.00489. In order to do it pass a list to
batch_size
, e.g."batch_size": [8, 32]
(default behaviour). If constantbatch_size
is required, pass anint
, e.g."batch_size": 8
.These parameters can be specified in the policy configuration file. The default values are defined in
EmbeddingPolicy.defaults
:defaults = { # nn architecture # a list of hidden layers sizes before user embed layer # number of hidden layers is equal to the length of this list "hidden_layers_sizes_a": [], # a list of hidden layers sizes before bot embed layer # number of hidden layers is equal to the length of this list "hidden_layers_sizes_b": [], # number of units in rnn cell "rnn_size": 64, # training parameters # flag if to turn on layer normalization for lstm cell "layer_norm": True, # initial and final batch sizes - batch size will be # linearly increased for each epoch "batch_size": [8, 32], # number of epochs "epochs": 1, # set random seed to any int to get reproducible results "random_seed": None, # embedding parameters # dimension size of embedding vectors "embed_dim": 20, # how similar the algorithm should try # to make embedding vectors for correct actions "mu_pos": 0.8, # should be 0.0 < ... < 1.0 for 'cosine' # maximum negative similarity for incorrect actions "mu_neg": -0.2, # should be -1.0 < ... < 1.0 for 'cosine' # the type of the similarity "similarity_type": 'cosine', # string 'cosine' or 'inner' # the number of incorrect actions, the algorithm will minimize # their similarity to the user input during training "num_neg": 20, # flag if minimize only maximum similarity over incorrect actions "use_max_sim_neg": True, # flag which loss function to use # regularization # the scale of L2 regularization "C2": 0.001, # the scale of how important is to minimize the maximum similarity # between embeddings of different actions "C_emb": 0.8, # scale loss with inverse frequency of bot actions "scale_loss_by_action_counts": True, # dropout rate for user nn "droprate_a": 0.0, # dropout rate for bot nn "droprate_b": 0.0, # dropout rate for rnn "droprate_rnn": 0.1, # attention parameters # flag to use attention over user input # as an input to rnn "attn_before_rnn": True, # flag to use attention over prev bot actions # and copy it to output bypassing rnn "attn_after_rnn": True, # flag to use `sparsemax` instead of `softmax` for attention "sparse_attention": False, # flag to use sparsemax for probs # the range of allowed location-based attention shifts "attn_shift_range": None, # if None, set to mean dialogue length / 2 # visualization of accuracy # how often calculate train accuracy "evaluate_every_num_epochs": 20, # small values may hurt performance # how many examples to use for calculation of train accuracy "evaluate_on_num_examples": 100 # large values may hurt performance }Note
Parameter
mu_neg
is set to a negative value to mimic the original starspace algorithm in the casemu_neg = mu_pos
anduse_max_sim_neg = False
. See starspace paper for details.
Form Policy¶
The FormPolicy
is an extension of the MemoizationPolicy
which
handles the filling of forms. Once a FormAction
is called, the
FormPolicy
will continually predict the FormAction
until all slots
in the form are filled. For more information, see Slot Filling.
Mapping Policy¶
The MappingPolicy
can be used to directly map intents to actions such that
the mapped action will always be executed. The mappings are assigned by giving
and intent the property ‘triggers’, e.g.:
intents:
- greet: {triggers: utter_goodbye}
An intent can only be mapped to at most one action. The bot will run the action once it receives a message of the mapped intent. Afterwards, it will listen for the next message.
Note
The mapping policy will predict the mapped action after the intent (e.g.
utter_goodbye
in the above example) and afterwards it will wait for
the next user message (predicting action_listen
). With the next
user message normal prediction will resume.
You should have an example like
* greet
- utter_goodbye
in your stories. Otherwise any machine learning policy might be confused
by the sudden appearance of the predicted action_greet
in
the dialouge history.
Fallback Policy¶
The FallbackPolicy
invokes a fallback action if the intent recognition
has a confidence below nlu_threshold
or if none of the dialogue
policies predict an action with confidence higher than core_threshold
.
Configuration:
The thresholds and fallback action can be adjusted in the policy configuration file as parameters of the
FallbackPolicy
:policies: - name: "FallbackPolicy" nlu_threshold: 0.3 core_threshold: 0.3 fallback_action_name: 'action_default_fallback'
nlu_threshold
Min confidence needed to accept an NLU prediction core_threshold
Min confidence needed to accept an action prediction from Rasa Core fallback_action_name
Name of the fallback action to be called if the confidence of intent or action is below the respective threshold You can also configure the
FallbackPolicy
in your python code:from rasa_core.policies.fallback import FallbackPolicy from rasa_core.policies.keras_policy import KerasPolicy from rasa_core.agent import Agent fallback = FallbackPolicy(fallback_action_name="action_default_fallback", core_threshold=0.3, nlu_threshold=0.3) agent = Agent("domain.yml", policies=[KerasPolicy(), fallback])Note
You can include either the
FallbackPolicy
or theTwoStageFallbackPolicy
in your configuration, but not both.
Two-Stage Fallback Policy¶
This policy handles low NLU confidence in multiple stages.
If a NLU prediction has a low confidence score, the user is asked to affirm the classification of the intent.
- If they affirm, the story continues as if the intent was classified with high confidence from the beginning.
- If they deny, the user is asked to rephrase their message.
Rephrasing
- If the classification of the rephrased intent was confident, the story continues as if the user had this intent from the beginning.
- If the rephrased intent was not classified with high confidence, the user is asked to affirm the classified intent.
Second affirmation
- If the user affirms the intent, the story continues as if the user had this intent from the beginning.
- If the user denies, an ultimate fallback action is triggered (e.g. a handoff to a human).
Configuration¶
To use this policy, include the following in your policy configuration. Note that you cannot use this together with the default fallback policy.
policies:
- name: TwoStageFallbackPolicy
nlu_threshold: 0.3
core_threshold: 0.3
fallback_core_action_name: "action_default_fallback"
fallback_nlu_action_name: "action_default_fallback"
deny_suggestion_intent_name: "out_of_scope"
nlu_threshold |
Min confidence needed to accept an NLU prediction |
core_threshold |
Min confidence needed to accept an action prediction from Rasa Core |
fallback_core_action_name |
Name of the action to be called if the confidence of the Rasa Core action classification is below the threshold |
fallback_nlu_action_name |
Name of the action to be called if the confidence of Rasa NLU intent classification is below the threshold |
deny_suggestion_intent_name |
The name of the intent which is used to detect that the user denies the suggested intents |
Note
It is required to have the two intents affirm
and deny
in the
domain of the bot, to determine whether the user affirms or
denies a suggestion.
Default Actions for Affirmation and Rephrasing¶
Rasa Core provides the default implementations
action_default_ask_affirmation
and action_default_ask_rephrase
which are triggered when the bot asks the user to affirm
or rephrase their intent.
The default implementation of action_default_ask_rephrase
action utters
the response template utter_ask_rephrase
.
The implementation of both actions can be overwritten with Actions.
Have questions or feedback?¶
We have a very active support community on Rasa Community Forum that is happy to help you with your questions. If you have any feedback for us or a specific suggestion for improving the docs, feel free to share it by creating an issue on Rasa Core GitHub repository.