sentence-similarity
sentence-similarity copied to clipboard
This repository contains various ways to calculate sentence vector similarity using NLP models
trafficstars
Sentence Similarity Calculator
This repo contains various ways to calculate the similarity between source and target sentences. You can choose the pre-trained models you want to use such as ELMo, BERT and Universal Sentence Encoder (USE).
And you can also choose the method to be used to get the similarity:
1. Cosine similarity
2. Manhattan distance
3. Euclidean distance
4. Angular distance
5. Inner product
6. TS-SS score
7. Pairwise-cosine similarity
8. Pairwise-cosine similarity + IDF
You can experiment with (The number of models) x (The number of methods) combinations!
Installation
- This project is developed under conda enviroment
- After cloning this repository, you can simply install all the dependent libraries described in
requirements.txtwithbash install.sh
conda create -n sensim python=3.7
conda activate sensim
git clone https://github.com/Huffon/sentence-similarity.git
cd sentence-similarity
bash install.sh
Usage
- To test your own sentences, you should fill out corpus.txt with sentences as below:
I ate an apple.
I went to the Apple.
I ate an orange.
...
- Then, choose the model and method to be used to calculate the similarity between source and target sentences
python sensim.py
--model MODEL_NAME [use, bert, elmo]
--method METHOD_NAME [cosine, manhattan, euclidean, inner,
ts-ss, angular, pairwise, pairwise-idf]
--verbose LOG_OPTION (bool)
Examples
- In this section, you can see the example result of
sentence-similarity - As you know, there is a no silver-bullet which can calculate perfect similarity between sentences
- You should conduct various experiments with your dataset
- Caution:
TS-SS scoremight not fit with sentence similarity task, since this method originally devised to calculate the similarity between long documents
- Caution:
- Result:
References
Papers
- Universal Sentence Encoder
- Deep contextualized word representations
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
- Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
- BERTScore: Evaluating Text Generation with BERT
- A Hybrid Geometric Approach for Measuring Similarity Level Among Documents and Document Clustering
Libraries
- TF-hub's Universal Sentence Encoder
- Allen NLP's ELMo
- Sentence Transformers
- BERTScore
- Vector Similarity