bitext-lexind
bitext-lexind copied to clipboard
How to extract bilingual dictionary from parallel data?
Thank you for your inspiring work. However, I notice that you assume there is little parallel data, and you contruct synthetic parallel data with CRISS. So I wonder what is the best practice if I have a lot of parallel data, and want to induce a bilingual dictionary? Thank you in advance!
If you have sufficient parallel data you can use them as input to our method for extracting entries. You could also use SimAlign directly. The difference is there will more noise in the SimAlign outputs (but higher recall), whereas our method optimizes for a higher precision dictionary on top of SimAlign outputs.