lda2vec icon indicating copy to clipboard operation
lda2vec copied to clipboard

any chance to replace spacy with other lib support other language?

Open eminarcissus opened this issue 8 years ago • 1 comments

Spacy currently only supports English and German, what kind of work does spaCy works here? Is there any chance to replace it with some other lib?

eminarcissus avatar Aug 23 '16 06:08 eminarcissus

As stated in lda2vec/preprocess.py, this implementation "Uses spaCy to quickly tokenize text and return an array of indices.". As long as you just tokenize in the same way (the same output), you can use whatever you wish. ntlk (http://www.nltk.org/) has support for more languages in case you really need to be sure that the terms are part of a language (remember that both LDA and word2vec work even with slang).

AdrianTudC avatar Jun 26 '17 14:06 AdrianTudC