Adriane Boyd issues

Results 26 issues of


Adriane Boyd

File output does not match stdout output in v3.0.6

I noticed that the file output does not match the stdout output. It looks like the final article is missing in the file output, possibly due to buffering in a...

Fix Dutch noun chunks to skip overlapping spans

## Description Fix Dutch noun chunks to skip overlapping spans. ### Types of change Bug fix. ## Checklist - [x] I confirm that I have the right to submit this...

bug

lang / nl

Update cupy extras

## Description * Extend to v11 * Add `cupy-cuda11x` and `cupy-wheel` * Update quickstart to use `cupy-wheel` for CUDA 10.2+ ### Types of change ## Checklist - [x] I confirm...

gpu

Filter duplicate vectors when pruning vectors

## How to reproduce the behaviour When prioritizing vectors to keep, `Vocab.prune_vectors` doesn't handle existing duplicates from `key2row` well. By sorting/prioritizing by values from `key2row`, which may contain duplicate values,...

bug

feat / vectors

Handle sentence boundaries from multiple components

## Feature description Decide how to handle `is_sentenced` and sentence boundaries that may come from multiple components (Sentencizer, SentenceRecognizer, Parser). Some ideas: * have an `is_sentenced` property more like `is_parsed`...

enhancement

feat / parser

feat / doc

feat / sentencizer

Character-based orthographic variants

## Feature description Similar to the token-based orthographic variants, it would be useful to add data augmentation options for character-based orthographic variants. Examples are the Romanian variants discussed in #4736...

enhancement

training

feat / cli

Add checks (and converters?) for documents with multiple sentences in debug-data

## Feature description The parser section of `spacy debug-data` should show a warning when there are no/few documents with multiple sentences in the training data. Potentially add a simple converter...

enhancement

feat / cli

Token pattern validation doesn't support lowercase attributes

## How to reproduce the behaviour The JSON token pattern schema/validator only supports uppercase attributes. ``` import spacy from spacy.matcher import Matcher, PhraseMatcher nlp = spacy.load('en_core_web_sm') matcher = Matcher(nlp.vocab, validate=True)...

enhancement

feat / matcher

Update cupy extras, quickstart

Support offset mapping alignment for fast tokenizers

Switch to offset mapping-based alignment for fast tokenizers. With this change, slow vs. fast tokenizers will not give identical results with `spacy-transformers`. Additional modifications: * Update package setup for cython...

feat / alignment