ASSESS
ASSESS copied to clipboard
Improving matchings for the Standards with SoWs
Some initial exploratory paths:
Cosine Similarity with n-gram (Tf-Idf)
- fast
- interpretability
- no word sense
Use paragraph to vec to model the whole texts to be matched:
- this can capture the whole contexts (word relationships) and word sense better
- faster to calculate the similarity since only one vector
- no Interpretability
Use context word embeddings techniques to model each token and then use soft-cosine to find cosine similarity:
- this can capture the word sense better than simple cosine
- much slower operation (n^3)
- interpretability
- can use the approximate implementation from Gensim
- two available Context2vec available:
- Elmo: https://github.com/allenai/allennlp/blob/master/tutorials/how_to/elmo.md
- captures target + context vector representation together
- trained at the sentence level (may not be able to capture word sense at the paragraph level)
- Context2vec: https://github.com/orenmel/context2vec/blob/master/context2vec/eval/explore_context2vec.py
- captures target and context vector representations separately, will have to concatenate
- trained at the sentence level (may not be able to capture word sense at the paragraph level)
- Elmo: https://github.com/allenai/allennlp/blob/master/tutorials/how_to/elmo.md