stanza
stanza copied to clipboard
Stanford NLP Python library for tokenization, sentence segmentation, NER, and parsing of many human languages
**Describe the bug** When Stanza run from docker container at the server with more then ~20 cores - performance of the pipeline falling dramatically. **To Reproduce** 1) Get machine with...
## Description Added a check to find & replace excessively-long tokens with "UNK", in order to avoid downstream GPU memory in `POS`. See issue #1137 ## Approach To avoid having...
Add a script to convert from AWS annotator reports to a report on how much work each annotator did
Transliterate kazakh to a latin alphabet
**Problem / motivation** When processing large corpuses of text, the likelihood of encounter unexpected and ill-formatted inputs becomes large. In my case, I was processing a collection of texts, and...
**Describe the bug** I trained a custom stanza tokenizer and mwt on UD_English-GUM. When using the tokenizer & mwt for inference, the tokenizer changed the surface form of the word....
Hi, I'm trying to reproduce the results mentioned [here](https://stanfordnlp.github.io/stanza/constituency.html#available-models) for constituency parser on Penn treebank data. I have access to wsj data and I downloaded the `wsj_bert.pt` model by calling...
I saw the great idea for combined models here: https://stanfordnlp.github.io/stanza/combined_models.html Is there a process to request more of these? Specifically I was thinking of Hebrew right now.
Greetings all, I working on extracting subordinate clauses via Stanza (Indeed through spacy-stanze); however, dependency parsing seems to provide inaccurate results. Following the guide from https://universaldependencies.org [here](https://universaldependencies.org/u/dep/csubj.html), clausal subjects are...
This is my first question, so sorry if I make any mistake, but I haven't had found information about rebuilding an exsiting language from its sources. I plan doing this...