text2text
text2text copied to clipboard
Cross-lingual semantic retrieval
Perform a similar study to https://arxiv.org/pdf/1907.04307.pdf but expanding to support 100 languages using the embeddings from the translator.
Possibly start with the paper's code sample.
@artitw
This looks interesting. Can I begin to look into this?
@lere01 thanks for your interest. I would recommend the following steps:
- Try out the code sample mentioned above to ensure that results from the paper are reproducible.
- Run the same process but use Text2Text embeddings for 100 languages.
- Try different types of Text2Text embeddings: (a) neural, (b) TF-IDF and (c) BM-25. We can also ensemble all of them.
- Share your findings; report on any improvements and other things you learned.
Let us know what you think, and if you have other ideas.