nlp-js-tools-french icon indicating copy to clipboard operation
nlp-js-tools-french copied to clipboard

a couple of problematic results from lemmatizer

Open mcthulhu opened this issue 8 years ago • 1 comments

  1. I'm not sure what's happening here, but I was trying to lemmatize the word "écœurante," with config set to { tagTypes: ['adj', 'ver', 'nom'], strictness: false, minimumLength: 3, debug: true }; I had tried with strictness set to true first, then false, but it doesn't seem to matter. The result I get from

var nlpToolsFr = new NlpjsTFr(s, config); var lemmatizedWords = nlpToolsFr.lemmatizer();

is [{"id":0,"word":"urante","lemma":"urante"}], with the écœ at the beginning removed. I can't tell why. Other words beginning with é seem OK.

  1. [{"id":0,"word":"épaules","lemma":"épaules"}] Shouldn't the lemma be "épaule"? This was with the same config object as above.

mcthulhu avatar Sep 03 '17 16:09 mcthulhu

Hello mcthulhu,

  1. It kind of makes sense since I didn't anticipate this specific case, I'll patch it soon :)
  2. Weird, I'll have a look

I'll keep you informed.

Bastien

bastienbot avatar Sep 03 '17 17:09 bastienbot