stanza icon indicating copy to clipboard operation
stanza copied to clipboard

Unknown words can still result in punct tag at end of sentence

Open AngledLuffa opened this issue 3 years ago • 2 comments

For example, if a sentence ends with thicc and no sentence final punctutation, thicc is labeled PUNCT

AngledLuffa avatar Apr 08 '22 17:04 AngledLuffa

Although I don't know if this is fixed universally, I can say that the updated tagger does a better job of labeling thicc as an adjective in a sentence such as Jennifer's antennae are hella thicc. Although sadly it labels hella as an INTJ in some contexts such as Dat ass hella thicc even though it is clearly an ADV. Perhaps we need to add more uses of hella. For that matter, Dat is mistagged as INTJ as well instead of DET

AngledLuffa avatar Sep 14 '22 19:09 AngledLuffa

The sentence end punctuation is a lot better as a result of this PR:

https://github.com/stanfordnlp/stanza/pull/1303

Much less of an issue on EN and PR now. Will retrain other models when the new UD release comes out, unless there are other specific languages which need fixes

AngledLuffa avatar Oct 26 '23 21:10 AngledLuffa