Chinese-Word-Vectors icon indicating copy to clipboard operation
Chinese-Word-Vectors copied to clipboard

Data check.

Open zhangjh915 opened this issue 5 years ago • 1 comments

Have you had any data check on the w2v dictionaries like the outliers? What is the range for all the embedding values? Do I need to normalize them?

zhangjh915 avatar Jul 22 '19 10:07 zhangjh915

You don't need to worry about this. All word vectors are trained by ngram2vec toolkit. Ngram2vec toolkit is a superset of word2vec and fasttext toolkit. Thus, you can use these embeddings just like word2vec and fasttext.

shenshen-hungry avatar Jul 23 '19 02:07 shenshen-hungry