ASSESS icon indicating copy to clipboard operation
ASSESS copied to clipboard

Improving matchings for the Standards with SoWs

Open asitang opened this issue 6 years ago • 0 comments

Some initial exploratory paths:

Cosine Similarity with n-gram (Tf-Idf)

  • fast
  • interpretability
  • no word sense

Use paragraph to vec to model the whole texts to be matched:

  • this can capture the whole contexts (word relationships) and word sense better
  • faster to calculate the similarity since only one vector
  • no Interpretability

Use context word embeddings techniques to model each token and then use soft-cosine to find cosine similarity:

  • this can capture the word sense better than simple cosine
  • much slower operation (n^3)
  • interpretability
  • can use the approximate implementation from Gensim
  • two available Context2vec available:
    • Elmo: https://github.com/allenai/allennlp/blob/master/tutorials/how_to/elmo.md
      • captures target + context vector representation together
      • trained at the sentence level (may not be able to capture word sense at the paragraph level)
    • Context2vec: https://github.com/orenmel/context2vec/blob/master/context2vec/eval/explore_context2vec.py
      • captures target and context vector representations separately, will have to concatenate
      • trained at the sentence level (may not be able to capture word sense at the paragraph level)

asitang avatar Nov 04 '19 21:11 asitang