Chinese-Word-Vectors icon indicating copy to clipboard operation
Chinese-Word-Vectors copied to clipboard

How to define a basic line of "good word2vec"

Open zhouxincheng opened this issue 6 years ago • 2 comments

I use the toolkit to evaluate the vector, and I got the answer. However, I wonder if you can tell us what kind of value is the signal of the good vectors?

zhouxincheng avatar Oct 15 '18 11:10 zhouxincheng

That's a good question.

shenshen-hungry avatar Oct 16 '18 02:10 shenshen-hungry

The evaluation is a typical word analogy task, e.g. given the word "man", "king" and "woman", we can use word vectors to compute (king - man + woman). If the result has the highest similarity with the word "queen", it gets the correct answer. There are totally 17813 analogy questions in the evaluation set.

Analogy evaluation is to measure to what extent word vectors capture the linguistic relations. Thus, accuracy the higher the better.

For more information about the analogy evaluation, you could read the paper: Shen Li, et al. Analogical Reasoning on Chinese Morphological and Semantic Relations, ACL 2018.

If you are interested in selecting good embedding resource for downstream tasks, e.g. text classification and name entity recognition, the conclusion of this paper may be useful: Yuanyuan Qiu et al. Revisiting Correlations between Intrinsic and Extrinsic Evaluations of Word Embeddings CCL 2018

iris2hu avatar Oct 17 '18 03:10 iris2hu