stanza
stanza copied to clipboard
Unknown words can still result in punct tag at end of sentence
For example, if a sentence ends with thicc and no sentence final punctutation, thicc is labeled PUNCT
Although I don't know if this is fixed universally, I can say that the updated tagger does a better job of labeling thicc as an adjective in a sentence such as Jennifer's antennae are hella thicc. Although sadly it labels hella as an INTJ in some contexts such as Dat ass hella thicc even though it is clearly an ADV. Perhaps we need to add more uses of hella. For that matter, Dat is mistagged as INTJ as well instead of DET
The sentence end punctuation is a lot better as a result of this PR:
https://github.com/stanfordnlp/stanza/pull/1303
Much less of an issue on EN and PR now. Will retrain other models when the new UD release comes out, unless there are other specific languages which need fixes