Adriane Boyd
Adriane Boyd
I noticed that the file output does not match the stdout output. It looks like the final article is missing in the file output, possibly due to buffering in a...
## Description Fix Dutch noun chunks to skip overlapping spans. ### Types of change Bug fix. ## Checklist - [x] I confirm that I have the right to submit this...
## Description * Extend to v11 * Add `cupy-cuda11x` and `cupy-wheel` * Update quickstart to use `cupy-wheel` for CUDA 10.2+ ### Types of change ## Checklist - [x] I confirm...
## How to reproduce the behaviour When prioritizing vectors to keep, `Vocab.prune_vectors` doesn't handle existing duplicates from `key2row` well. By sorting/prioritizing by values from `key2row`, which may contain duplicate values,...
## Feature description Decide how to handle `is_sentenced` and sentence boundaries that may come from multiple components (Sentencizer, SentenceRecognizer, Parser). Some ideas: * have an `is_sentenced` property more like `is_parsed`...
## Feature description Similar to the token-based orthographic variants, it would be useful to add data augmentation options for character-based orthographic variants. Examples are the Romanian variants discussed in #4736...
## Feature description The parser section of `spacy debug-data` should show a warning when there are no/few documents with multiple sentences in the training data. Potentially add a simple converter...
## How to reproduce the behaviour The JSON token pattern schema/validator only supports uppercase attributes. ``` import spacy from spacy.matcher import Matcher, PhraseMatcher nlp = spacy.load('en_core_web_sm') matcher = Matcher(nlp.vocab, validate=True)...
Switch to offset mapping-based alignment for fast tokenizers. With this change, slow vs. fast tokenizers will not give identical results with `spacy-transformers`. Additional modifications: * Update package setup for cython...