stanza
stanza copied to clipboard
Stanford NLP Python library for tokenization, sentence segmentation, NER, and parsing of many human languages
I am trying to parse Arabic texts using the pretrained model (PADT), but some portions of texts are recognized as a single sentence. For example, this Arabic passage results in...
**Describe the bug** FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which...
I get the following error when running NER: `TypeError: 'NoneType' object is not subscriptable` After debugging the error, I found out that it is trying to access the document's `text`...
I am using Stanza to identify NER is short pieces of text like (business names/brand names). Here is one example: ``` # BUILDING THE MODELS #-----stanza sen = stanza.Pipeline ("en")...
Add a word classifier to cover ambiguous lemmas such as `'s`
create a second tokenization stage which uses contextual *word* embeddings as well token embeddings to be more accurate. Doesn't seem to help for languages with clearly delinated orthographies but in...
**Describe the bug** When a Pipeline is instantiated with a text containing some URLs (e.g. example.com) in Portuguese, the URLs are broken into their own sentences, as the dots are...
**Description:** I have encountered an issue with the Stanza pipeline for the Indonesian language, specifically with the tokenizer processor. The pipeline fails to handle sentence segmentation properly when a sentence...
it should not be necessary for download to call set_logging_level we could have a separate logging specifically for the downloads if we want to log the downloads at a higher...
**Describe the bug** Take the following sentence: **Assurez-vous d'être à l'heure !** The word vous has a wrong dependency relation with Stanza 1.8.2, but correct with Stanza 1.8.1 Stanza 1.8.1...