text2text icon indicating copy to clipboard operation
text2text copied to clipboard

Cross-lingual semantic retrieval

Open artitw opened this issue 3 years ago • 2 comments

Perform a similar study to https://arxiv.org/pdf/1907.04307.pdf but expanding to support 100 languages using the embeddings from the translator.

Possibly start with the paper's code sample.

artitw avatar Mar 26 '22 23:03 artitw

@artitw

This looks interesting. Can I begin to look into this?

lere01 avatar May 30 '22 14:05 lere01

@lere01 thanks for your interest. I would recommend the following steps:

  1. Try out the code sample mentioned above to ensure that results from the paper are reproducible.
  2. Run the same process but use Text2Text embeddings for 100 languages.
  3. Try different types of Text2Text embeddings: (a) neural, (b) TF-IDF and (c) BM-25. We can also ensemble all of them.
  4. Share your findings; report on any improvements and other things you learned.

Let us know what you think, and if you have other ideas.

artitw avatar May 30 '22 17:05 artitw