Adriane Boyd comments

Results 345 comments of


                                            Adriane Boyd

Unaligned predicted spans are ignored in `Scorer.score_spans`

As a note, I think warnings get tricky because they'd show up in `spacy train` output.

bus error upon existing the program after using spacy on mac M1

Thanks for the report! The info provided makes this look specific to the `trf` model, in particular `curated-tokenizers`. If you have a minute, could you create a new venv without...

bus error upon existing the program after using spacy on mac M1

If you also install `sentencepiece` in the new venv?

bus error upon existing the program after using spacy on mac M1

In general this seems to be a known issue related to `sentencepiece`, which is vendored in `curated-tokenizers`. I'm not currently sure exactly which conditions are necessary for you to run...

max_length of nlp pipline for e.g. japanese

`nlp.max_length` is not a hard internal constraint, but rather a kind of clunky way to protect users from confusing OOM errors. It was set with the "core" pipelines and a...

max_length of nlp pipline for e.g. japanese

Thanks for the suggestion! I think that this description is slightly confusing for users, since `nlp.max_length` itself will behave the same way for all languages. What we need to highlight...

Lemmatization issues [Italian][Spanish][French]

Thanks for the examples, they'll be helpful when looking at how to improve the lemmatizers in the future!

Lemmatization issues [Italian][Spanish][French]

The French lemmatizer in the v3.7 trained pipelines is a rule-based lemmatizer that depends on the part-of-speech tags from the statistical tagger to choose which rules to apply. In these...

Lemmatization issues [Italian][Spanish][French]

We wouldn't use the lexique data in our pipelines due to the non-commercial clause in the CC BY-NC license, but if the license works for your use case and you'd...

Lemmatization issues [Italian][Spanish][French]

Overall it sounds like a lookup lemmatizer, which doesn't depend on context, might be a better fit for these kinds of examples. You can see how to switch from the...