stanza
stanza copied to clipboard
Stanford NLP Python library for tokenization, sentence segmentation, NER, and parsing of many human languages
I am working on a project where I would like to run a large corpus of text through a stanza pipeline using the the processors `'tokenize,lemma,pos,depparse'`. I am trying to...
**Describe the bug** For these two sentences: ``` The first challenge that we have before we can do any kind of analysis of these interstellar dust particles is to find...
I'm running some of the available NER models on different texts. Many times in these text **URLs**, **EMAILs**, **mentions** and so on appear and I've seen that, not surprisingly, the...
Hi there! Please could you tell me if there is any coref model? Classical (non-neural, jvm-based) CoreNLP includes several ones, but I can't find it here.
**To Reproduce** Steps to reproduce the behavior: ``` import stanza stanza.download('pt') stnz = stanza.Pipeline('pt', use_gpu=False) text = stnz("convido-os a levantarem-se para um minuto de silêncio .") print(*[f'word: {word.text+" "}\tlemma: {word.lemma}\tupos:...
**Describe the bug** In some cases, and since the 1.2.0 update, the French GSD model considers the last word of the phrase and the final punctuation as the same token....
I listened to the recent PyTorch Dev Conference. Yuhao Zhang said that a new release with 83 languages is coming. Do these new 30 languages include Thai as well? There...
**Describe the bug** Hi, for tokenizing a very large database of ~20M biomedical texts we tried to parallelize the tokenization with GPU support and multiporcessing. The same code as #552...
I have the following MWE: ``` from stanfordnlp.server import CoreNLPClient text = 'Barack Obama was born in the Hawaii. He was the president of the United States. ' prop =...
**Is your feature request related to a problem? Please describe.** It is really really hard to do any advanced stuff with stanza. The documentation only explains the most basic usage,...