spaCy
spaCy copied to clipboard
💫 Industrial-strength Natural Language Processing (NLP) in Python
## Description Remove default stop words Stop words are task-specific and attempting to maintain "general-purpose" stop word lists for many different languages is not feasible. None of the underlying functionality...
## Description Refactor pipe(as_tuples) into a separate method ### Types of change ? ## Checklist - [x] I confirm that I have the right to submit this contribution under the...
Hello, I've recently upgraded the spaCy pretrained models from v3.2 to 3.4, but I found that the tagger and lemmatizer performance dropped significantly for italian and spanish. I've prepared a...
## How to reproduce the behaviour Download https://www.gutenberg.org/files/1342/1342-0.txt — Pride & Prejudice, about 0.8MB. Then run: ```python import spacy nlp = spacy.load("en_core_web_sm") with open("./1342-0.txt") as f: book = f.read() result...
I want to create a custom NER tag using GPT2. I want to use [this model](https://huggingface.co/openai-community/gpt2). I am familiar with SpaCy custom training framework. I formatted the config.cfg file as...
**Description** Build a custom component to: 1. identify coordinations in a document 2. split the coordinations 3. return a new `Doc` object with the split coordinations
I'm using version 1.3.4 of spacy-transformers but it has incompatibility with the latest version of transformers (4.37.2). Is an update planned? Thanks
## Description Modify EL batching to work doc-based instead of a mention-based. For prior discussion as to why this is useful see https://github.com/explosion/spaCy/pull/11669#issuecomment-1283666113. Review and merge after https://github.com/explosion/spaCy/pull/12341. Split off...
## Description Adds Azure API key example for LLM configuration. This is useful as it is not the same as the expected OpenAI Azure client variable (`AZURE_OPENAI_API_KEY`) ### Types of...
Extended list of abbreviations in Faroese language extension's tokenizer exceptions. ## Checklist - [x] I confirm that I have the right to submit this contribution under the project's MIT license....