stanza
stanza copied to clipboard
Stanford NLP Python library for tokenization, sentence segmentation, NER, and parsing of many human languages
> Sorry for dropping this on the floor. Actually, it's pretty straightforward, so there's not really a good excuse for making you wait. > > > > ``` > >...
Hi folks, I am just new to Sandfordnlp and found it better compared to other methods. I would like to know how to implement negations to entities as like negspacy...
This page https://stanfordnlp.github.io/stanza/pipeline.html in the description of the package option says "A complete list of available packages can be found [here](https://stanfordnlp.github.io/stanza/models.html)." However there is no list of packages at the...
I have been investigating the code for the coreference model to better understand its inner workings. One thing that caught my attention is that during training documents longer than 5,000...
I would like a way to deploy stanza models in web environments, i.e. using pyodide. I imagine that dependencies are the biggest (insurmountable?) hurdle blocking this feature. It does not...
Hello! We have been using Stanza 1.10.1 with single document processing but want to switch to batch processing to increase speed. For that, we ran some benchmarks, among other things...
**Describe the bug** When tokenizing neuter words in Romanian, they are tagged as "Gender=Masc" **To Reproduce** Analyze a sentence such as "Sistemul este foarte bun". The neuter noun "sistem" appears...
**Is your feature request related to a problem? Please describe.** Stanza promises some basic functionalities for all languages but NER is not implemented for Urdu yet. **Describe the solution you'd...
Hello! **Is your feature request related to a problem? Please describe.** Currently, a closing index of a discontinuous mention is not captured by the regex in the [convert_udcoref.py](https://github.com/stanfordnlp/stanza/blob/af3d42b70ef2d82d96f410214f98dd17dd983f51/stanza/utils/datasets/coref/convert_udcoref.py) script. For...
When I import a single CoNLL-U Document via CoNLL.conll2doc and then run a pipeline with tokenize_pretokenized=True, tokenize_no_ssplit=True on it, it gets processed without problems. However, when I put several CoNLL-U...