Adriane Boyd

Results 349 comments of Adriane Boyd

Do you have a source for the stop words? I'm still a bit confused about the tokenizer settings vs. stop words. Is `ky'` ever a separate token and not just...

I'm worried that users will be confused in the future because "ky'" is a stop word but never a separate token that could be marked as a stop word. Does...

Sorry for the delay, I thought I should wait on an update because in the current version the contractions are still added to the stop words. If the contractions are...

Thanks again for the PR! We'll mention Luganda in the release notes for the next release (probably v3.4.2).

Just a note that although it looks similar, the config format isn't TOML. Underneath it's using `configparser` with modifications.

Thanks for the report, I think we just didn't consider interpolation in the directory names. I don't think there's any particular reason these blocks couldn't be swapped.

@AhmedIssa11 Sure, we'd be happy to review a PR for this!

Most of the noun chunk iterators check for overlapping spans, but this seems to be missing for Dutch. A PR that adds this would be welcome, in general the check(s)...

All the possible tags in are `spacy.parts_of_speech.IDS`. You're right that this isn't documented well on the token attributes page. Since spacy handles any kind of input text and includes token...

I've added rules to convert `SPACE` -> `X` for non-space tokens that we will tentatively plan to use in the v3.5.0 model releases.