nlp.js icon indicating copy to clipboard operation
nlp.js copied to clipboard

How to train entities/slots using corpus?

Open radicand opened this issue 3 years ago • 13 comments

Summary

I'd prefer to use a corpus file instead of hand-coding my training declarations, but it's unclear how to annotate entities/slots in the training file.

Context

If I use a corpus consisting of objects per the existing examples (see below example), everything works fine and the intent/answer are correctly recognized.

{
      "intent": "tell_me_a_joke",
      "utterances": [
        "Tell me a joke",
        "Can you tell me a joke?",
        "I want to hear a joke",
        "make me laugh",
        "tell me something funny"
      ],
      "answers": ["Knock knock"]
    }

However, I want to extract information from some of my intents (see below mock example) to avoid ping-ponging the user for information they've already shared. I've attempted to use % notation as seen below, but it completely breaks that intent altogether it seems.

{
      "intent": "tell_me_a_joke",
      "utterances": [
        "Tell me a joke about %type%",
        "Can you tell me a %type% joke?",
        "I want to hear a %type% joke",
        "make me laugh",
        "tell me something funny"
      ],
      "answers": ["What type of joke do you want to hear?"]
    }

Example above is trivial, but the general gist is I'd expect the framework to take this and extract out the entity into a slot. However, I have no idea how to approach this if I'm using a corpus. Any examples/guidance would be fabulous.

Your Environment

Software Version
nlp.js 4.10.5
node 13.13.0
npm 6.14.4
Operating System macOS

radicand avatar Aug 24 '20 21:08 radicand

I am also interested in this question. I also wonder if it is possible to describe action in the corpus.

smegloy avatar Aug 26 '20 12:08 smegloy

I am also interested in the possibility of slot-filling through a corpus, so I can't give you a full answer. However, the Huge-NER example was quite useful to me. For mentioning entities in your intents, use one @-symbol instead of the percentages. So, in your example this would be "I want to hear a @type joke". You should store your 'type' entity in a json file. Again, refer to the airport.json in the NER example. In your index file, use nlp.addNerRuleOptionTexts to add your entity. This is the interesting section in the example:

const airportKeys = Object.keys(airports); for (let i = 0; i < airportKeys.length; i += 1) { const airport = airports[airportKeys[i]]; nlp.addNerRuleOptionTexts('en', 'airport', airport.icao, airport.city); }

Hope it helps!

vanessavj avatar Aug 26 '20 17:08 vanessavj

Thanks for the pointer on using the @ character here. I've updated my corpus to do that, but the trouble with my actual use-case (failure to demonstrate this with the trivial joke example) is that the entity value isn't/can't be known in advance as the interaction is more or less a search term helper. Real example would be something like Help me find content on machine learning, and I'm aiming for the system to extract out machine learning without needing to re-prompt the user. The NER example you refer to relies on the values being known in advance, which isn't really possible in my case unfortunately.

Again, refer to the airport.json in the NER example. In your index file, use nlp.addNerRuleOptionTexts to add your entity. This is the interesting section in the example:

const airportKeys = Object.keys(airports); for (let i = 0; i < airportKeys.length; i += 1) { const airport = airports[airportKeys[i]]; nlp.addNerRuleOptionTexts('en', 'airport', airport.icao, airport.city); }

radicand avatar Aug 26 '20 17:08 radicand

I have the same use-case, I want to be able to train on entities such as car features and car names without having to program every single car name in advance. If a user says What is <Tesla Model S> available in? Then I want it to be have to extract colors as feature and Tesla Model S is the car model.

keyvez avatar Sep 04 '20 17:09 keyvez

Hello,

Here an example: https://github.com/axa-group/nlp.js/tree/master/examples/14-ner-corpus Here a quickstart: https://github.com/axa-group/nlp.js/blob/master/docs/v4/ner-quickstart.md

jesus-seijas-sp avatar Oct 14 '20 11:10 jesus-seijas-sp

Is there a way to use Enum entities but have the list of items be open ended? So, without adding "captain america" to the list of heroes, the hero entity would still be found with "captain america" as a value.

rmtuckerphx avatar Oct 15 '20 22:10 rmtuckerphx

I am also looking for what Mark Tucker above needs, a way to identify entities that are not in the training set but have traits that can be used to train and identify entities with some likelihood metric attached.

keyvez avatar Oct 26 '20 18:10 keyvez

Yeah, I'm looking for the same.

The slot filling example on the home page points to a v3 version that no longer works:

https://github.com/axa-group/nlp.js/blob/master/docs/v3/slot-filling.md

This would do exactly what I want if it worked though 😊

MattRiddell avatar Oct 30 '20 14:10 MattRiddell

More specifically it says this:

  const manager = new NlpManager({ languages: ['en'] });
  const fromEntity = manager.addTrimEntity('fromCity');
  fromEntity.addBetweenCondition('en', 'from', 'to');
  fromEntity.addAfterLastCondition('en', 'from');
  const toEntity = manager.addTrimEntity('toCity');
  toEntity.addBetweenCondition('en', 'to', 'from', { skip: ['travel'] });
  toEntity.addAfterLastCondition('en', 'to');
 
  manager.slotManager.addSlot('travel', 'fromCity', true, { en: 'From where you are traveling?' });
  manager.slotManager.addSlot('travel', 'toCity', true, { en: 'Where do you want to go?' });
  manager.slotManager.addSlot('travel', 'date', true, { en: 'When do you want to travel?' });


  manager.addDocument('en', 'I want to travel from %fromCity% to %toCity% %date%', 'travel');
  await manager.train();
  const result = await manager.process('en', 'I want to travel to Madrid tomorrow', {});
  console.log(JSON.stringify(result, null, 2));

The problem is that the slot manager stuff doesn't seem to be in NLPManager anymore so I'm guessing the code has to change?

I saw that by adding %name% in greetings.hello intent such as hello my name is %name% it does get added to the slot manager when I train it, but it doesn't appear to get parsed when I feed it "hello my name is matt".

I'll keep digging, but others may want to follow the path through:

@nlpjs/slot/src/slot-manager

with debug statements

MattRiddell avatar Oct 30 '20 14:10 MattRiddell

I've made some progress by making the entity mandatory, but at the moment I'm getting a response of:

Hi there hello my name is matt

instead of

Hi there matt

MattRiddell avatar Oct 30 '20 16:10 MattRiddell

It automatically works out of the box for most entities if you do something like this: https://github.com/axa-group/nlp.js/blob/master/packages/builtin-compromise/README.md#L1

wparad avatar Oct 11 '21 11:10 wparad

Does anyone have a "slot filling" example based on v4? y thing would be like

  • "Please start the irrigation" -> "How long?" -> "@duration minutes" -> "ok I start the irrigation for @durcation minutes"
  • "Please irrigate for @duration minutes" -> "ok I start the irrigation for @durcation minutes"

...

Apollon77 avatar Jul 21 '22 08:07 Apollon77

I did several fixes in my PRs and also enhance documentation a lot

Apollon77 avatar Aug 08 '22 19:08 Apollon77

Closing due to inactivity. Please, re-open if you think the topic is still alive.

aigloss avatar Nov 25 '22 09:11 aigloss