spaCy issues

ValueError: [E102] Can't merge non-disjoint spans. - Dutch

3

## Error message ValueError: [E102] Can't merge non-disjoint spans. 'opvlamming' is already part of tokens to merge. If you want to find the longest non-overlapping spans, you can use the...

kilzone

bug

lang / nl

feat / doc

POS tags generated from en_core_web_sm differs from Universal POS tag set.

1

## How to reproduce the behaviour I'm trying to get the part-of-speach from some sentences to use it in a ML model. According to the [documentation](https://spacy.io/api/token#attributes) the possible values should...

David-hg

models

Problems and errors in new German lemmatizer (since 3.3.0)

6

For some context, here was the master issue for problems in lemmatization for the lookup-based lemmatizer for German: https://github.com/explosion/spaCy/issues/2486 And here was the announcement that German would be prioritized for...

lutz-100worte

lang / de

feat / lemmatizer

Enable multiprocessing groups within project config

28

## Description Enable the specification of a group of commands within a spaCy project workflow that are to be executed in parallel. ### Features 1. Each spaCy projects command is...

richardpaulhudson

enhancement

feat / cli

scaling

Match private networks as URLs

1

Currently, URLs under private networks won't be matched as URLs by the tokenizer e.g. ``` https://10.140.12.13/foo/bar?arg1=baz&arg2=taz # not a URL https://142.140.12.13/foo/bar?arg1=baz&arg2=taz # A URL ``` From my understanding, at some...

antonpibm

enhancement

feat / tokenizer

🔜 v4.0

Add Softmax scorer for spancat

1

**WIP**: This PR adds a Softmax scorer for the spancat component that should be used in cases where classes are exclusive. ## Description Spancat currently has a `LinearLogistic` layer that...

ljvmiranda921

enhancement

⚠️ wip

feat / spancat

Make it possible to merge Vocab/StringStore instances

9

## Feature description Here is the background: in order to use a spacy document it is necessary to have the correct Vocab/StringStore. But when documents are created/processed in a distributed/multiprocessing...

johann-petrak

enhancement

feat / serialize

scaling

[WIP] Split up `universe.json`

6

## Description This is a work-in-progress PR with a proposal on how to split up the `universe.json` file. This PR is nowhere near ready to be merged, but I made...

koaning

docs

Executing a none python script using "Spacy Projects" generates an error

2

This issue is related to Spacy Project ## How to reproduce the behaviour Add a non-python command in the script section of the `project.yml` For eg. In the below snippet,...

dhirajsuvarna

windows

projects

Sentencepiece base Language

1

feature request: Sentencepiece is the tokenizer used in XLNet. I think if `Language` tokenize text with `sentencepiece`, the alignment process can be skipped and it make model efficient.

tamuhey

enhancement

feat / tokenizer

feat / transformer

new language

spaCy
spaCy copied to clipboard

Metadata

ValueError: [E102] Can't merge non-disjoint spans. - Dutch

POS tags generated from en_core_web_sm differs from Universal POS tag set.

Problems and errors in new German lemmatizer (since 3.3.0)

Enable multiprocessing groups within project config

Match private networks as URLs

Add Softmax scorer for spancat

Make it possible to merge Vocab/StringStore instances

[WIP] Split up `universe.json`

Executing a none python script using "Spacy Projects" generates an error

Sentencepiece base Language

← Metadata

Owner

Metadata

spaCy spaCy copied to clipboard

Metadata

← Metadata

Owner

Metadata

spaCy
spaCy copied to clipboard