skip-thoughts icon indicating copy to clipboard operation
skip-thoughts copied to clipboard

Using skip-thought representations for computing sentence similarity

Open Sukanya-Kudi opened this issue 7 years ago • 3 comments

Can we use the skip-thought vectors generated as a representation of sentences for further tasks like in vision we use something like a pre-trained AlexNet model to obtain feature representation and use for further tasks. I encoded using the (utable.npy, btable.npy, etc) and tried computing a sentence similarity/ cosine similarity of vectors obtained but the values seem high. Is it that my understanding of using a model given to obtain a pre-trained network for representation wrong ?

The sentences for example are

  1. Two women and as many men managed to escape, in an injured condition, from the shrine.
  2. Both had done plenty of damage to the Federer brand already during the sun-shot afternoon, threatening to spoil that much-anticipated potential final showdown between the Swiss icon and his lifelong rival, Rafael Nadal.

The cosine similarity of the above sentences comes up to be 0.79259253. Isn't the cosine similarity expected to be lesser. Am I making a mistake ?

Sukanya-Kudi avatar Apr 05 '17 13:04 Sukanya-Kudi

I'd check how many of the words in the second sentence are actually contained in the model. It could be that because some words are encoded as the 'UNK' (unknown) vector, the 2 sentences thrown into close subspaces. I'd also go on and try other sentence pairs to test.

On Wed, Apr 5, 2017, 6:08 AM Sukanya-Kudi [email protected] wrote:

Can we use the skip-thought vectors generated as a representation of sentences for further tasks like in vision we use something like a pre-trained AlexNet model to obtain feature representation and use for further tasks. I encoded using the (utable.npy, btable.npy, etc) and tried computing a sentence similarity/ cosine similarity of vectors obtained but the values seem high. Is it that my understanding of using a model given to obtain a pre-trained network for representation wrong ?

The sentences for example are

  1. Two women and as many men managed to escape, in an injured condition, from the shrine.
  2. Both had done plenty of damage to the Federer brand already during the sun-shot afternoon, threatening to spoil that much-anticipated potential final showdown between the Swiss icon and his lifelong rival, Rafael Nadal.

The cosine similarity of the above sentences comes up to be 0.79259253. Isn't the cosine similarity expected to be lesser. Am I making a mistake ?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ryankiros/skip-thoughts/issues/47, or mute the thread https://github.com/notifications/unsubscribe-auth/ADRaSIKN1OnGQj0aNVI1tu3vJ9OF2Nl4ks5rs5IxgaJpZM4M0Ntj .

csiki avatar Apr 06 '17 18:04 csiki

I have tried with creating unrelated news articles and a set of sentences with that are subtitle files that have a good amount of correlation correlation but the as I get all the sentences pairs having cosine similarity >0.5 which is not expected. And a majority of the sentences strangely give ~0.8 as the similarity. PFA the files used for analyzing.

unrelated.txt Sea_Dragon.txt sentence_sim.zip

Sukanya-Kudi avatar Apr 07 '17 10:04 Sukanya-Kudi

I encounter the same puzzling behavior... (Using https://github.com/Cadene/skip-thoughts.torch port)

mwestera avatar Feb 11 '19 16:02 mwestera