Performing Retrieval using TSAspire and OTAspire

Open psw0021 opened this issue 1 year ago • 1 comments

Hi, I'm currently trying to perform retrieval tasks on SCIDOCS benchmark using your embedding model TS-Aspire and OT-Aspire. It seems that the code you provided mentions that the output of TS-Aspire model have both document cls representation and sentence representation. What exact representation should I use to perform document level retrieval as you have done in your paper? Moreover, between L2 loss or cosine similarity, what would be the optimal way to perform document retrieval? It seems that your model was trained based on L2 loss. In addition, in terms of using OT-Aspire for scientific paper retrieval tasks, should I use wasserstein distance to reproduce the results obtained from your paper? Finally, is multi-task trained Aspire(OT + TS) not uploaded on hugging face?

Thank You.

Jan 21 '25 08:01 psw0021

Thanks for getting in touch and apologies for the delay responding.

It seems that the code you provided mentions that the output of TS-Aspire model have both document cls representation and sentence representation. What exact representation should I use to perform document level retrieval as you have done in your paper?

You should use the sentence representation and compute the minimum L2 distance between the pairwise sentence embedding distances for document level retrieval.

Moreover, between L2 loss or cosine similarity, what would be the optimal way to perform document retrieval? It seems that your model was trained based on L2 loss.

You should use the L2 distance.

In addition, in terms of using OT-Aspire for scientific paper retrieval tasks, should I use wasserstein distance to reproduce the results obtained from your paper?

Yes!

Finally, is multi-task trained Aspire(OT + TS) not uploaded on hugging face?

Correct the model isnt on HF.

Apr 13 '25 18:04 MSheshera