stanza
stanza copied to clipboard
Stanford NLP Python library for tokenization, sentence segmentation, NER, and parsing of many human languages
**Describe the bug** `stanza.download()` fails to download resources from a host that sends a [chunked response](https://en.wikipedia.org/wiki/Chunked_transfer_encoding). ```python In [1]: import stanza In [2]: stanza.download('en') --------------------------------------------------------------------------- TypeError Traceback (most recent call...
**Is your feature request related to a problem? Please describe.** I wrote a coreference resolver based on my requirements using the coref model as base to create clusters. Sometimes in...
**Describe the bug** In `yo como carne`, `como` is identified as `upos SCONJ`, while it should be `VERB`. I am running this pipeline: ``` { "text": "Yo como carne.", "processors":...
I encountered an issue while training and evaluating models using the specified setup. When the training process completed, a "Permission Denied" error occurred with the temporary file used to save...
**Describe the bug** Evaluating "Ich wasche meine Hände." in Stanza 1.11 leads to "Hände" being treated as a verb with `lemma=hinden`. There is no verb "hinden" in German, and Hände...
Hello, I have multiple Tregex patterns and want to use the CoreNLPClient.tregex()method to get matching results for sentences. Do I need to call tregex()multiple times for multiple patterns, or can...
Would a morpheme segmentation processor that turns arbitrary text into morphemes be a viable feature? My friend and I have been working on a library based on a model in...
I would like to use the coref processor for dialogues where I know the speaker of each sentence. This should help eliminate spurious I/you coref chains that are obviously wrong...
in many places it does things such as ``` deprel_seqs = [self.vocab['deprel'].unmap([preds[1][i][j+1][h] for j, h in enumerate(hs)]) for i, hs in enumerate(head_seqs)] ``` which, while unlikely, includes PAD and the...
I get out of memory errors on unpunctuated text input. And I believe the reason might be the batch dividing method on the TokenizeProcessor. The docs claim that the batches...