vexbot icon indicating copy to clipboard operation
vexbot copied to clipboard

Bootstrap Intent Classification/Utterance Generation

Open benhoff opened this issue 7 years ago • 1 comments

Intent Classification

Rasa NLU uses a linear SVM to classify the intent by leveraging spaCy's n-gram model to vectorize utterances.

There's also been research into using seq2seq to classify intent and do slot filling as seen in this paper from microsoft. Also, here's a python implementation.

Entity Extraction

Rasa NLU has several methods of entity extraction as documented here. These include conditional random field for custom entity extraction (not pretrained). SpaCy provides entity extraction as well in the form of an averaged perceptron. The third option is a duckling server, which uses context-free grammar. Facebook has an Open source implementation of context-free grammar.

As mentioned above, a seq2seq approach can also be used as documented here.

Bootstrapping Utterances

Writing Utterances is a pain in the rear. There might be a way to bootstrap the utterance generation to alleviate the need to manually make them.

Here's a list of data corpus's that should prove useful for that regard. That paper also has an overview of useful methods for building dialogue systems.

The paper also has an interesting reference Luke, I am your father: dealing with out-of-domain requests by using movies subtitles. This should be useful for one off responses.

This google blog research has an example for handle to help rank uniquness of response, which will be necessary for generation of unique responses.

this repo uses the Cornell Movie-Dialogs Corpus and a seq to seq neural net to implement the google blog post.

Should be able to also leverage reddit using the movie corpus code I've written already.

Context

The real challenge is going to be handling context.

There's a way to handle the context as proposed in the ubuntu dialog corpus, using an affinity model with context c (five consecutive utterances for example). The Paper is here.

Final Thoughts

The easiest would be to follow the paper to build a one off for out of domain requests. A sort of pithy response bot, as it were.

benhoff avatar Dec 27 '17 00:12 benhoff

Sentence level similarity should be able to be used.

https://towardsdatascience.com/sentence-embedding-3053db22ea77

https://www.microsoft.com/en-us/research/project/deep-reinforcement-learning-goal-oriented-dialogue/

benhoff avatar Dec 29 '17 16:12 benhoff