spacy-stanza icon indicating copy to clipboard operation
spacy-stanza copied to clipboard

User Warnings make parsing Late

Open chaouiy opened this issue 5 years ago • 3 comments

I am parsing a big corpus that takes days to index. It is an arabic corpus so I need spacy-stanza. I have noticed that it is printing for each sentence I parse UserWarning: Can't set named entities because of multi-word token expansion or because the character offsets don't map to valid tokens produced by the Stanza tokenizer This makes the parsing a lot slower. I suggest to remove these warnings

chaouiy avatar Sep 23 '20 10:09 chaouiy

Hi, you can use python warnings filters to manage how these warnings are handled: https://docs.python.org/3/library/warnings.html#the-warnings-filter

adrianeboyd avatar Oct 10 '20 11:10 adrianeboyd

Hi, I still have the same ussue. How to simply shut up all UserWarning?

isaac47 avatar May 24 '21 09:05 isaac47

You can call code with python -W ignore script.py to turn off all warnings (docs). This is not special or unique to spaCy.

polm avatar May 24 '21 10:05 polm

Just going through some older issues...

It sounds like this was resolved, but please feel free to reopen if you're still running into issues!

adrianeboyd avatar Oct 09 '23 14:10 adrianeboyd