Chinese_models_for_SpaCy icon indicating copy to clipboard operation
Chinese_models_for_SpaCy copied to clipboard

comparison with stanza?

Open dcsan opened this issue 5 years ago • 1 comments

not a bug report per se

I'm wondering how spacy/chinese models compares with the stanza project? Stanza already provides chinese support with many features https://stanfordnlp.github.io/stanza/models.html

that has a chinese (simplified) model and provides dep-parser, lemma and other basic NLP features.

I'm a bit confused as it uses spacy for tokenization: https://stanfordnlp.github.io/stanza/tokenize.html#use-spacy-for-fast-tokenization-and-sentence-segmentation

You can only use spaCy to tokenize English text for now, since spaCy tokenizer does not handle multi-word token expansion for other languages.

which would imply spacy is a lower level library, and yet they seem similar.

dcsan avatar Mar 23 '20 20:03 dcsan

Hi @dcsan, to me why Stanza uses spacy for tokenization maybe just because SpaCy's tokenization for English is pretty good. I think Stanza and Spacy are both full-featured NLP frameworks.

howl-anderson avatar Mar 24 '20 08:03 howl-anderson