wink-nlp icon indicating copy to clipboard operation
wink-nlp copied to clipboard

Pos tagging for imperative sentence is inconsistent

Open moskaliukua opened this issue 1 year ago • 2 comments

Hi, I ran into a corner case with pos tagging for imperative sentences like: Suppose I tell you that it is true. if run this sentence on its own then it works as expected

import winkNLP from 'wink-nlp';
import model from 'wink-eng-lite-web-model';
const nlp = winkNLP(model);
nlp.readDoc('Suppose I tell you that it is true.').printTokens();

token p-spaces prefix suffix shape case nerHint type normal/pos ——————————————————————————————————————————————————————————————————————————————————————— Suppose 0 Su ose Xxxxx 3 0 word suppose / VERB I 1 I I X 2 0 word i / PRON tell 1 te ell xxxx 1 0 word tell / VERB you 1 yo you xxx 1 0 word you / PRON that 1 th hat xxxx 1 0 word that / SCONJ it 1 it it xx 1 0 word it / PRON is 1 is is xx 1 0 word is / AUX true 1 tr rue xxxx 1 0 word true / ADJ . 0 . . . 0 0 punctuat . / PUNCT

if run it with text that contains one sentence before it changes pos of suppose to pnoun

nlp.readDoc('I watch TV every day.').printTokens();
nlp.readDoc('Suppose I tell you that it is true.').printTokens();

token p-spaces prefix suffix shape case nerHint type normal/pos ——————————————————————————————————————————————————————————————————————————————————————— I 0 I I X 2 0 word i / PRON watch 1 wa tch xxxx 1 0 word watch / VERB TV 1 TV TV XX 2 0 word tv / NOUN every 1 ev ery xxxx 1 0 word every / DET day 1 da day xxx 1 0 word day / NOUN . 0 . . . 0 0 punctuat . / PUNCT

total number of tokens: 6

token p-spaces prefix suffix shape case nerHint type normal/pos ——————————————————————————————————————————————————————————————————————————————————————— Suppose 0 Su ose Xxxxx 3 0 word suppose / PROPN I 1 I I X 2 0 word i / PRON tell 1 te ell xxxx 1 0 word tell / VERB you 1 yo you xxx 1 0 word you / PRON that 1 th hat xxxx 1 0 word that / SCONJ it 1 it it xx 1 0 word it / PRON is 1 is is xx 1 0 word is / AUX true 1 tr rue xxxx 1 0 word true / ADJ . 0 . . . 0 0 punctuat . / PUNCT

the problem occurs only with some specific sentences or specific words, I haven't figured it out yet. for example:

 nlp.readDoc('I like playing football').printTokens();
 nlp.readDoc('Suppose I tell you that it is true.').printTokens();

produces correct response: Suppose 0 Su ose Xxxxx 3 0 word suppose / VERB

can it be related cache? also is there an easy way to disable cache, or make lib to parse sentence in isolation without loading model again?

versions of packages: "wink-eng-lite-web-model": "^1.8.0", "wink-nlp": "^2.3.0",

moskaliukua avatar Aug 07 '24 21:08 moskaliukua