Adriane Boyd

Results 347 comments of Adriane Boyd

(No need for patreon: this is my job!) For local testing I only have one GPU, so I may not be much immediate help. The `spacy train` CLI doesn't have...

Thanks for the report! There are number of changes coming soon for spacy v3 that make the tokenizer more consistent, in particular for special cases that contain prefix/suffix/infix punctuation that...

It turned out that this has too much of an effect on the existing tokenizer settings, which were designed without this infix special case checking. It might be possible to...

There is the same issue for the `lemmatizer` with its lookup tables. It doesn't call `validate_get_examples`, though, it just ignores it, so you can call `nlp.get_pipe("lemmatizer").initialize()`. The warning isn't helpful...

Thanks for the report! This seems to be a hard-coded limit in msgpack: https://github.com/explosion/srsly/blob/03e8861eb08b3c33cc86e7c2e049e5b126538dff/srsly/msgpack/_packer.pyx#L44 We'll look into it, but since I'm not sure why msgpack has this limit, I'm not...

I can't think of anything major, but to be on the safe side we should test it with all the internal training corpora. Let me see...

Here is a related discussion: https://github.com/explosion/spaCy/discussions/5051

Oops, yeah that should be done differently. (But I don't understand why this ends up different in the second round than in the first?)

It isn't just for timing purposes because you're not actually running the final component (which is the NER model you're trying to train) unless you iterate over that generator. (Earlier...

Suggestions from @DomHudson in https://github.com/explosion/spaCy/issues/5050#issuecomment-590235869: > In my opinion the combination of `{None, True, False}` is not transparent or flexible enough to provide the information that it is currently trying...