stanza
stanza copied to clipboard
Stanford NLP Python library for tokenization, sentence segmentation, NER, and parsing of many human languages
A very weird English tree produced by Stanza 1.6.0 in the [demo](http://stanza.run/): > My cousin my extremely rude colleague admired last year chewed the chicken enthusiastically. In UD, no word...
I want to run the following code, but an error occurred. import stanza pipe = stanza.Pipeline("en", processors="tokenize,ner", package={"ner": ["ncbi_disease", "ontonotes"]}) doc = pipe("John Bauer works at Stanford and has hip...
**Describe the bug** If there is a comma in the parsed sentence, the PROIEL model: a) does not tokenize the comma, it just bundles it with the preceding word. The...
**Describe the bug** I have some out of mems with 35 GB processes, stanze could be tracked down as reason. **To Reproduce** Steps to reproduce the behavior: 1. Take e.g....
Hi, Thanks for this tool. I noticed that sometimes coref doesn't use the proper noun, is there any way to make it use the proper noun? Here is my code...
Hi, The Apple Silicon GPU (MPS) is not detected, even when using `use_gpu=True`. Is there any way to use the MPS GPU? Thanks!
**Describe the bug** When parsing a long text using the latest "combined_electra-large" model, I get the error: ``` Token indices sequence length is longer than the specified maximum sequence length...
Hi, I have been using stanza bulkprocess to tokenize and ssplit a rather large text stored in a dataframe. My question is how to show progress bar when running the...
**Describe the bug** Tokens without a space after them in the original text do not include that info in the misc field of the Word object or in the conllu...
**Describe the bug** [ஊறு](https://ta.wiktionary.org/wiki/%E0%AE%8A%E0%AE%B1%E0%AF%81) **To Reproduce** Steps to reproduce the behavior: ``` import logging import stanza logging.getLogger('stanza').setLevel(logging.ERROR) # Download and initialize the Tamil model # stanza.download('ta') nlp = stanza.Pipeline(lang='ta') #...