ASSESS

ASSESS copied to clipboard

Reame
Issues

Improving matchings for the Standards with SoWs

Open asitang opened this issue 6 years ago • 0 comments

Some initial exploratory paths:

Cosine Similarity with n-gram (Tf-Idf)

fast
interpretability
no word sense

Use paragraph to vec to model the whole texts to be matched:

this can capture the whole contexts (word relationships) and word sense better
faster to calculate the similarity since only one vector
no Interpretability

Use context word embeddings techniques to model each token and then use soft-cosine to find cosine similarity:

this can capture the word sense better than simple cosine
much slower operation (n^3)
interpretability
can use the approximate implementation from Gensim
two available Context2vec available:
- Elmo: https://github.com/allenai/allennlp/blob/master/tutorials/how_to/elmo.md
  - captures target + context vector representation together
  - trained at the sentence level (may not be able to capture word sense at the paragraph level)
- Context2vec: https://github.com/orenmel/context2vec/blob/master/context2vec/eval/explore_context2vec.py
  - captures target and context vector representations separately, will have to concatenate
  - trained at the sentence level (may not be able to capture word sense at the paragraph level)

Nov 04 '19 21:11 asitang