nlg-eval
nlg-eval copied to clipboard
Infersent
Hi,
I am trying to obtain the semantic similarity between the generated and the ground truth sentence.
I used all these metrics to evaluate the generated sentences (validation dataset): BLEU 1 | 0.128031 BLEU 2 | 0.056153 BLEU 3 | 0.029837 BLEU 4 | 0.013649 METEOR | 0.305482 ROUGE_L | 0.148652 CIDEr | 0.069519 SkipThought cosine similarity | 0.765784 Embedding Average cosine similarity | 0.973187 Vector Extrema cosine similarity | 0.683888 Greedy Matching score | 0.94496
Some of these metrics indicates the sentences to be quite similar and some shows sentences to be different. Can you please suggest a metric to obtain the semantic similarity between sentences.
How about the Infersent and word mover's distance? I think you should consider adding these metrics for evaluation of text generation. This repository is helpful for evaluation of generated text.
Can you please suggest which one of these metrics is widely used to get the semantic similarity between sentences? Thanks!