spacy-stanza
spacy-stanza copied to clipboard
Use sentencizer with stanfordnlp
Right now spacy-stanfordnlp is taking care of the tokenization too. Would it be possible to use spacy' sentencizer and keeping stanfordnlp just for tagging and parsing?
I can only think about running two pipelines, the first one that only uses sentencizerand the second one that uses stanfordnlp.Pipeline. I will have a double tokenization, and probably a performance penalty
I'm getting through the doc and looking at the source code but can't find any proper way to do it
It seems that Stanford NLP has a tokenize_pretokenized option. https://stanfordnlp.github.io/stanfordnlp/pipeline.html#running-on-pre-tokenized-text. I'll see if I can use that
Just going through some older issues, and it sounds like you found a solution. But please feel free to reopen if you're still running into issues!