nlp.js
nlp.js copied to clipboard
How to train entities/slots using corpus?
Summary
I'd prefer to use a corpus file instead of hand-coding my training declarations, but it's unclear how to annotate entities/slots in the training file.
Context
If I use a corpus consisting of objects per the existing examples (see below example), everything works fine and the intent/answer are correctly recognized.
{
"intent": "tell_me_a_joke",
"utterances": [
"Tell me a joke",
"Can you tell me a joke?",
"I want to hear a joke",
"make me laugh",
"tell me something funny"
],
"answers": ["Knock knock"]
}
However, I want to extract information from some of my intents (see below mock example) to avoid ping-ponging the user for information they've already shared. I've attempted to use %
notation as seen below, but it completely breaks that intent altogether it seems.
{
"intent": "tell_me_a_joke",
"utterances": [
"Tell me a joke about %type%",
"Can you tell me a %type% joke?",
"I want to hear a %type% joke",
"make me laugh",
"tell me something funny"
],
"answers": ["What type of joke do you want to hear?"]
}
Example above is trivial, but the general gist is I'd expect the framework to take this and extract out the entity into a slot. However, I have no idea how to approach this if I'm using a corpus. Any examples/guidance would be fabulous.
Your Environment
Software | Version |
---|---|
nlp.js |
4.10.5 |
node |
13.13.0 |
npm |
6.14.4 |
Operating System | macOS |
I am also interested in this question. I also wonder if it is possible to describe action in the corpus.
I am also interested in the possibility of slot-filling through a corpus, so I can't give you a full answer. However, the Huge-NER example was quite useful to me. For mentioning entities in your intents, use one @-symbol instead of the percentages. So, in your example this would be "I want to hear a @type joke". You should store your 'type' entity in a json file. Again, refer to the airport.json in the NER example. In your index file, use nlp.addNerRuleOptionTexts to add your entity. This is the interesting section in the example:
const airportKeys = Object.keys(airports); for (let i = 0; i < airportKeys.length; i += 1) { const airport = airports[airportKeys[i]]; nlp.addNerRuleOptionTexts('en', 'airport', airport.icao, airport.city); }
Hope it helps!
Thanks for the pointer on using the @
character here. I've updated my corpus to do that, but the trouble with my actual use-case (failure to demonstrate this with the trivial joke example) is that the entity value isn't/can't be known in advance as the interaction is more or less a search term helper. Real example would be something like Help me find content on machine learning
, and I'm aiming for the system to extract out machine learning
without needing to re-prompt the user. The NER example you refer to relies on the values being known in advance, which isn't really possible in my case unfortunately.
Again, refer to the airport.json in the NER example. In your index file, use nlp.addNerRuleOptionTexts to add your entity. This is the interesting section in the example:
const airportKeys = Object.keys(airports); for (let i = 0; i < airportKeys.length; i += 1) { const airport = airports[airportKeys[i]]; nlp.addNerRuleOptionTexts('en', 'airport', airport.icao, airport.city); }
I have the same use-case, I want to be able to train on entities such as car features and car names without having to program every single car name in advance. If a user says What
Hello,
Here an example: https://github.com/axa-group/nlp.js/tree/master/examples/14-ner-corpus Here a quickstart: https://github.com/axa-group/nlp.js/blob/master/docs/v4/ner-quickstart.md
Is there a way to use Enum entities but have the list of items be open ended? So, without adding "captain america" to the list of heroes, the hero entity would still be found with "captain america" as a value.
I am also looking for what Mark Tucker above needs, a way to identify entities that are not in the training set but have traits that can be used to train and identify entities with some likelihood metric attached.
Yeah, I'm looking for the same.
The slot filling example on the home page points to a v3 version that no longer works:
https://github.com/axa-group/nlp.js/blob/master/docs/v3/slot-filling.md
This would do exactly what I want if it worked though 😊
More specifically it says this:
const manager = new NlpManager({ languages: ['en'] });
const fromEntity = manager.addTrimEntity('fromCity');
fromEntity.addBetweenCondition('en', 'from', 'to');
fromEntity.addAfterLastCondition('en', 'from');
const toEntity = manager.addTrimEntity('toCity');
toEntity.addBetweenCondition('en', 'to', 'from', { skip: ['travel'] });
toEntity.addAfterLastCondition('en', 'to');
manager.slotManager.addSlot('travel', 'fromCity', true, { en: 'From where you are traveling?' });
manager.slotManager.addSlot('travel', 'toCity', true, { en: 'Where do you want to go?' });
manager.slotManager.addSlot('travel', 'date', true, { en: 'When do you want to travel?' });
manager.addDocument('en', 'I want to travel from %fromCity% to %toCity% %date%', 'travel');
await manager.train();
const result = await manager.process('en', 'I want to travel to Madrid tomorrow', {});
console.log(JSON.stringify(result, null, 2));
The problem is that the slot manager stuff doesn't seem to be in NLPManager anymore so I'm guessing the code has to change?
I saw that by adding %name%
in greetings.hello intent such as hello my name is %name%
it does get added to the slot manager when I train it, but it doesn't appear to get parsed when I feed it "hello my name is matt".
I'll keep digging, but others may want to follow the path through:
@nlpjs/slot/src/slot-manager
with debug statements
I've made some progress by making the entity mandatory, but at the moment I'm getting a response of:
Hi there hello my name is matt
instead of
Hi there matt
It automatically works out of the box for most entities if you do something like this: https://github.com/axa-group/nlp.js/blob/master/packages/builtin-compromise/README.md#L1
Does anyone have a "slot filling" example based on v4? y thing would be like
- "Please start the irrigation" -> "How long?" -> "@duration minutes" -> "ok I start the irrigation for @durcation minutes"
- "Please irrigate for @duration minutes" -> "ok I start the irrigation for @durcation minutes"
...
I did several fixes in my PRs and also enhance documentation a lot
Closing due to inactivity. Please, re-open if you think the topic is still alive.