wink-nlp
wink-nlp copied to clipboard
Pos tagging for imperative sentence is inconsistent
Hi, I ran into a corner case with pos tagging for imperative sentences like: Suppose I tell you that it is true. if run this sentence on its own then it works as expected
import winkNLP from 'wink-nlp';
import model from 'wink-eng-lite-web-model';
const nlp = winkNLP(model);
nlp.readDoc('Suppose I tell you that it is true.').printTokens();
token p-spaces prefix suffix shape case nerHint type normal/pos
———————————————————————————————————————————————————————————————————————————————————————
Suppose 0 Su ose Xxxxx 3 0 word suppose / VERB
I 1 I I X 2 0 word i / PRON
tell 1 te ell xxxx 1 0 word tell / VERB
you 1 yo you xxx 1 0 word you / PRON
that 1 th hat xxxx 1 0 word that / SCONJ
it 1 it it xx 1 0 word it / PRON
is 1 is is xx 1 0 word is / AUX
true 1 tr rue xxxx 1 0 word true / ADJ
. 0 . . . 0 0 punctuat . / PUNCT
if run it with text that contains one sentence before it changes pos of suppose to pnoun
nlp.readDoc('I watch TV every day.').printTokens();
nlp.readDoc('Suppose I tell you that it is true.').printTokens();
token p-spaces prefix suffix shape case nerHint type normal/pos ——————————————————————————————————————————————————————————————————————————————————————— I 0 I I X 2 0 word i / PRON watch 1 wa tch xxxx 1 0 word watch / VERB TV 1 TV TV XX 2 0 word tv / NOUN every 1 ev ery xxxx 1 0 word every / DET day 1 da day xxx 1 0 word day / NOUN . 0 . . . 0 0 punctuat . / PUNCT
total number of tokens: 6
token p-spaces prefix suffix shape case nerHint type normal/pos
———————————————————————————————————————————————————————————————————————————————————————
Suppose 0 Su ose Xxxxx 3 0 word suppose / PROPN
I 1 I I X 2 0 word i / PRON
tell 1 te ell xxxx 1 0 word tell / VERB
you 1 yo you xxx 1 0 word you / PRON
that 1 th hat xxxx 1 0 word that / SCONJ
it 1 it it xx 1 0 word it / PRON
is 1 is is xx 1 0 word is / AUX
true 1 tr rue xxxx 1 0 word true / ADJ
. 0 . . . 0 0 punctuat . / PUNCT
the problem occurs only with some specific sentences or specific words, I haven't figured it out yet. for example:
nlp.readDoc('I like playing football').printTokens();
nlp.readDoc('Suppose I tell you that it is true.').printTokens();
produces correct response:
Suppose 0 Su ose Xxxxx 3 0 word suppose / VERB
can it be related cache? also is there an easy way to disable cache, or make lib to parse sentence in isolation without loading model again?
versions of packages: "wink-eng-lite-web-model": "^1.8.0", "wink-nlp": "^2.3.0",