In online systems with large
numbers of users, the demand for automated chatbots to serve users is
increasing. Chatbot systems can be used to support or replace customer
care officers in several tasks that can be automated.
For
example, question-answering chatbot can answer automatically questions
about the services which a company provides; a hospital can use the
chatbot on its website to obtain patients’ information or to assist
patients with initial information about the symptoms, or to guide
procedures for registration of medical examination and treatment.
Chatbot
systems communicate with humans by voice (like Siri) or by text (like
chatbots developed on Facebook Messenger platform). No matter what kind
of communication means, chatbot needs to understand input texts so that
it can provide the right answers forcustomers. The component responsible
for this work in a chatbot system is called NLU (Natural Language
Understanding), which incorporates a number of natural language
processing (NLP) techniques.
In this
article, we introduce three basic NLP problems when one develops a
chatbot system and some typical approaches. We focus onchatbot systems
used in the closed domain and applyretrieval-based model. The
information retrieval-based model is a model in which the chatbot
provides feedback that is prepared in advance or in accordance with
certain patterns. This model is different from the generative model, in
which the chatbot’s responses are automatically generated by learning
from a dialogue data set (read more from reference [1]). Most of chatbot
systems that are deployed in practice follow information
retrieval-based models and are applied in certain application domains.
The
three NLP problems covered in this article are: 1) Intent
classification or intent detection; 2) Information extraction; and 3)
Dialogue management. At last, we also point out the challenges of
developing the chatbot system and the limitations of the current
technology.
User intent detection
In
common, users often visit the chatbot system with a desire that the
system will take action to help themselves on a certain issue. For
example, users of the chatbot system which supports booking air tickets
may offer their booking requests at the beginning of the conversation.
To provide accurate support, the chatbot needs to determine the intent
of the user. User intent detection will determine how the next
conversation between the user and the chatbot will take place.
Therefore, if the user intent is misinterpreted, chatbot will give
incorrect responses.. At that time, the user may feel disgusted and have
no intention of using the system again. The problem of detecting user
intent is therefore very important in the chatbot system.
For
closed domains, we can limit the number of user intentions to a finite
set of defined intents, which are related to the business operations
that chatbot can support. With this limitation, the problem of detecting
user intents can be formalized as the text categorization problem. With
input being a saying of the user, the classification system determines
the intent corresponding to that saying from the set of intents that
have been defined.
To build an intent
classification model, we need a training dataset that includes different
expressions for each intent. For example, with the same question about
the weather in Hanoi today, users can use the following expressions:
– What is the weather today in Hanoi?
– Does Hanoi rain today?
– What is the temperature in Hanoi today?
– Excused me, when going out today should we bring a raincoat?
It
can be said that the training step for the intent classification
problem is one of the most important tasks in developing the chatbot
system, and it has a huge impact on the quality of the chatbot system.
This work also requires considerable time and effort of chatbot
developers.
Machine learning model for the problem of categorizing user intent
Once
the training data for the intent categorization problem is available,
we will model the problem into a text categorization problem. Text
categorization is a classic problem in the NLP field and Text mining.
The text categorization model for the intent classification problem is
expressed in the following form:
We are given a training set consisting of pairs (message, intent), D = {(x(1), y(1)),…, (x(n), y(n))}, where x(i) is a message and y(i) is the corresponding intent for x(i) . The intent y(i) is in a finite set Κ
including intents that are defined already. We need to learn from this
training data, a classification model Θ, which is capable of
classifying a new message into one of the intents in the set K. The architecture of the intent categorization system is illustrated in Figure 1.
Figure 1. Achitecture of the intent categorization system
The intent categorization system has some basic components:
- Data pre-processing
- Feature extraction
- Model training
- Categorizing
In
the data pre-processing stage, we will perform the “cleaning” of data
such as removing redundant information, standardizing data such as
turning misspelled words into correct ones, standardizing Abbreviations,
etc. Pre-processing data plays an important role in the chatbot system
due to the specificity of the chatting and conversational language:
abbreviation, misspelling, or “teencode”.
After
pre-processing and obtaining the data that has been cleaned, we will
extract the features from this data. In machine learning, this step is
called feature extraction or feature engineering. In traditional machine
learning model (before deep learning model is widely applied), the
feature extraction step affects the accuracy of the classification model
significantly. To extract good features, we need to carefully analyze
dataand also to use expert knowledge in each specific application
domain.
The training step uses
extracted features as input and applies machine learning algorithms to
learn a classification model. Classification models may be
classification rules (if using decision trees) or a weight vector
corresponding to extracted features (as in logistic regression model,
SVM, or Neural network).
After having
an intent classification model, we can use it to classify a new message.
The input message also goes through preprocessing and extraction steps,
then the classification model determines the “score” for each intent in
the set and gives out the intent which has the highest score.
Model based on content matching
The
intent classification model based on statistical machine learning
requires training data including different expressions for each intent.
This training data is usually prepared manually. The data preparation
step takes quite some time and effort, especially in applications where
the number of intents is relatively large.
An
approach that can reduce the effort required to prepare training data
is content matching approach. In this approach, we still need to prepare
the data, each intent has at least one corresponding question. With a
given message, we will apply a content matching algorithm to match the
message with each question in the dataset. The answer to the question
with the closest content to the input will be returned. In practical
application, we can give list (e.g: top 3) most appropriate answers for
the user to choose.
The method of
matching information is quite suitable for chatbot systems used for
answering frequently asked questions (FAQ). We can take advantage of
existing FAQ data to create a FAQ Chatbot by content matching method
without creating training data as in the statistical machine learning
model.
One of the challenges of the
content-matching model is thathandlingdifferent expressions for the same
question requireshand-crafted rules. Since the number of samplesfor
each intent is small, the matching model will have to use rules or
semantic resources to handle different variations when expressing a
word, phrase, or an sentence. Sentences 1) and 2) in the example below
use different expressions for the same customer’s question of a telecom
company about slow network condition.
- Ad, why is my home network so slow recently?
- My network lags many times, so frustrated.
In
the example above, if we use the content-matching model, the system
needs to recognize that the word “slow” and “lag” (the language used on
the Internet) have the same meaning.
Currently,
semantic resources for Vietnamese language processing are not
sufficient, so the approach based on statistical machine learning model
or hybrid model – combining both statistical machine learning andcontent
matching may be more appropriate for Vietnamese chatbots.
Information Extraction
Besides
detectingthe intent in a user’s message, we need to extract the
information we need in it. The information to be extracted in a message
is usually entities of certain types. For example, when a customer wants
to book an airplane ticket, the system needs to know the departure and
destination location, the date and time the customer wants to travel,
etc. NLU components of chatbot systems usually support following entity
types(read more in reference [2]):
- Location
- Datetime
- Number
- Contact
- Distance
- Duration
Could | you | please | book | me | a | flight | to | London | on | 25th | this | month | ? |
O | O | O | O | O | O | O | O | B-LOCATION | O | B-TIME | I-TIME | I-TIME | I-TIME |
Figure 2: Assign word labels according to B-I-O model in extracting information
The
input of a information extraction module is a message. The information
extraction module needs to locate the entities in the statement (from
start and end of entity). The following example illustrates a messageand
entities extracted from that.
- Input message: Could you please book me a flight to London on 25th this month ?
- The message with identified entities: Could you please book me a flight to [London]LOCATION on [25th this month]TIME ?
In the sentence above there are two entities (in the [ ] with the corresponding entity types written in subscript font).
The
common approach to the problem of extracting information is to
formalizethe problem into a sequence labeling problem. The input of a
sequence labeling problem is a sequence of words, and the output is a
sequence of labels corresponding to the words in the input. We will use
machine learning models to learn a labeling model from a set of input
data including pairs (x1…xn, y1…yn), where x1…xn is the sequence of words, y1…yn is the sequence of labels. The length of the sequences in the dataset may vary.
In
the information extraction problem, the label set for the words in the
input sentence usually uses the BIO model, in which B stands for
“Beginning”, I for “Inside”, and O for “Outside”. When we know the
position of the first word of an entity and words within that entity, we
can determine position of that entity in the sentence. In the example
above, the sequence of labels corresponding to the sequence of words in
the input message is illustrated in Figure 2.
The
popular sequence labelingalgorithm is the Hidden Markov Models (HMM)
[3], Conditional Random Fields (CRF)[4]. With textual data, CRF model
usually outperforms HMM model. There are several of open sources setting
CRF tool for sequence labeling problem such as CRF ++ [5], CRF Suite
[6], Mallet [7], and more.
Recently,
Recurrent Neural Networks have been used widely for sequence labeling.
The Recurrent Neural Networks model has been proved effective with
textual data because it models the dependency relationship between words
in the sentence. For example, the Recurrent Neural Network is applied
to POS Tagging problem or problem of named entity recognition [8].
Dialogue Management
In
long conversation between a person and a chatbot, the chatbot needs to
remember the context or manage dialogue states. The problem of dialogue
management is important to ensure that the communication between people
and machines is smooth.
The function
of the dialogue management component is to receive input from NLU
component, to manage dialogue states, dialogue contexts, and to transmit
output to Natural Language Generation (NLG). For example, the dialogue
management module in an air ticket booking chatbot needs to know when
the user has provided enough information for booking tickets to create a
ticket to the system or when they need to reconfirm the information put
by that user. Currently, chatbot products typically use Finite State
Automata (FSA) model, Frame-based model (Slot Filling), or a combination
of these two models.
Figure 3: Illustration of Dialogue Management using Finite State Automata (FSA) model
FSA
is the simplest dialogue management model. For example, imagine a
customer care system of a telecom company, serving customers who
complain about slow network issues. The task of chatbot is to ask the
customer’s name, phone number, Internet package name he/she is using,
the actual Internet speed. Figure 3 illustrates a dialogue management
model for chatbot customer care. FSA states correspond to questions that
dialogue manager asks the user. The links between the states
corresponding to actions that the chatbot would take. These actions
depend on user’s response to the questions. In FSA model, chatbot is the
user-oriented side of the conversation.
The
advantages of FSA model are simple and the chatbot will pre-define the
response that user wants. However, FSA model is not really suitable for
complex chatbot systems or when users offer different information in the
same message. In the example above, when a user simultaneously provides
both name and phone number, if the chatbot continues asking for the
phone number, the user may feel uncomfortable.
A
Frame-based model (also called Form-based) can solve the problem that
FSA model faces. Frame-based model is based on predefined frames to
navigate the conversation. Each frame contains the required slots and
corresponding questions that dialogue manager asks the user. This model
allows user to fill in the various slots in the frame. Figure 4 is an
example of a frame for the chatbot above.
Slot | Question |
Full name | What is your name? |
Phone number | What is your phone number? |
Internet package name | What is the name of your Internet package? |
Actual Internet speed | What is your current Internet speed? |
Figure 4: Frame for chatbot to ask for information (in slow Internet connection example)
The
dialogue manager using Framework-based model will ask questions to the
customer, fill in the slots based on the information that customer
provided until it collects enough necessary information. When the user
answers multiple questions at the same time, the system will have to
fill in the corresponding slots and remember to not ask questions that
have already been answered.
In complex
application domains, a dialogue can have many different frames. The
problem for chatbot developers is how to know when to switch between
frames. An approach commonly used to manage the change of control
between frames is to define production rules. These rules are based on a
number of elements, such as the last messages or questions an user has
asked.
Challenges
Although
NLP and Machine Learning fields have improved a lot, there are still
many challenges in chatbot development that researchers need to
overcome. We list two issues below.
The
first problem is coreference. In speaking and writing, we often use
short way to address the objects we mentioned earlier. For example,
while writing or speaking English, people may use pronouns like “it”,
“they”, “he”, … Without contextual information and a coreference
analyzer, it is very difficult for chatbots to know what/who these words
refer to. Failure to identify the correct object to which these
alternative words refers may cause chatbot to misunderstand user’s
dialogue. This challenge is quite apparent in long conversations.
The
second problem is how to reduce the effort in annotating data while
developing chatbot. According to the above approaches, when developing a
chatbot application, the developer needs to label the training data for
the intent classifier and the named entity recognizer . In complex
application domains (such as mecial and health care), it is quite
expensive to create such datasets. Therefore, the development of methods
to utilize available data sources in the enterprise to reduce the
amount of data required to label as well as ensuring the accuracy of the
natural language processing models is necessary.
Conclusion
In
this article, we have introduced three basic NLP problems in developing
chatbot systems which are used in the closed domain and follow the
retrieval-based model. Within the volume of an article, we can not
provide more detailed information about the mentioned models and newer
approaches in chatbot development (eg, Generative Hierarchical Neural
Network approach – sequence to sequence [9]). Interested readers can
read more in the references.
References
- Stefan Kojouharov. “Ultimate Guide to Leveraging NLP & Machine Learning for your Chatbot”. On Chatbotlife. https://chatbotslife.com/ultimate-guide-to-leveraging-nlp-machine-learning-for-you-chatbot-531ff2dd870c#.rabx346bq
- Pavlo Bashmakov. Advanced Natural Language Processing Tools for Bot Makers – LUIS, Wit.ai, Api.ai and others (UPDATED). https://stanfy.com/blog/advanced-natural-language-processing-tools-for-bot-makers/
- Michael Collins. Hidden Markov models and tagging (sequence labeling) problems. http://www.cs.columbia.edu/~mcollins/hmms-spring2013.pdf
- Michael Collins. Log-Linear Models, MEMMs, and CRFs. http://www.cs.columbia.edu/~mcollins/crf.pdf
- Taku Kudo. CRF++: Yet Another CRF toolkit. https://taku910.github.io/crfpp/
- Naoaki Okazaki. CRFsuite: A fast implementation of Conditional Random Fields (CRFs). http://www.chokkan.org/software/crfsuite/
- Mallet toolkit: http://mallet.cs.umass.edu/
- Zhiheng Huang, Wei Xu, Kai Yu. 2015. Bidirectional LSTM-CRF Models for Sequence Tagging. On arxiv, https://arxiv.org/abs/1508.01991
- Iulian V. Serban, Alessandro Sordoni, Yoshua Bengio, Aaron Courville, Joelle Pineau. Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models. On arxiv, https://arxiv.org/abs/1507.04808
- Jurafsky, D., & Martin, J. H. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. Chapter 24. “Dialogue and Conversational Agents”.
Pham Quang Nhat Minh – FPT Technology Research Institute (FTRI)
No comments:
Post a Comment