Adriane Boyd comments

Results 349 comments of


                                            Adriane Boyd

📚 Inaccurate pre-trained model predictions master thread

I think a warning in the docs would be good. This hasn't come up before, so either it's rare for people to be using a GPU in windows or something...

📚 Inaccurate pre-trained model predictions master thread

I think a new issue focused on windows + GPU would be useful. I didn't mean older versions of spacy, just older versions of cupy and maybe thinc (within the...

📚 Inaccurate pre-trained model predictions master thread

We can't necessarily fix it, but if we (well, you) can figure out that a particular version of cupy works better, we can provide that information in the warning. A...

📚 Inaccurate pre-trained model predictions master thread

@umarniz That looks like a mistake in the docs, sorry for the confusion! The NER model is trained on NER annotation done on top of UD_Dutch-LassySmall, but the tagger and...

📚 Inaccurate pre-trained model predictions master thread

@LiberalArtist : The v2.2. models are using some new data augmentation to try to make them less case-sensitive, which leads to less certainty about `NN` vs. `NNP` distinctions, and for...

📚 Inaccurate pre-trained model predictions master thread

@MartinoMensio: The POS tags are mapped from the fine-grained PTB tag set, which doesn't distinguish auxiliary verbs from main verbs. All verbs get mapped to `VERB` except for some exceptions...

📚 Inaccurate pre-trained model predictions master thread

@fersarr: It's useful to have these kinds of results here, thanks! Imperatives are a case where the provided models often perform terribly because there are very few imperatives in the...

📚 Inaccurate pre-trained model predictions master thread

@Riccorl : This is the expected behavior for the base `xx` tokenizer used in that model, which just doesn't work for languages without whitespace between tokens. It was a mistake...

📚 Inaccurate pre-trained model predictions master thread

@joancf : I'm pretty sure this is an unfortunate side effect of not having a proper encoding for special tokens or special characters in the transformer tokenizer output. In the...

📚 Inaccurate pre-trained model predictions master thread

@dblandan The v3.3 German models switched from a lookup lemmatizer that only used the word form (no context) to a statistical lemmatizer where the output does depend on the context.