bitext-lexind icon indicating copy to clipboard operation
bitext-lexind copied to clipboard

How to extract bilingual dictionary from parallel data?

Open HuihuiChyan opened this issue 2 years ago • 1 comments

Thank you for your inspiring work. However, I notice that you assume there is little parallel data, and you contruct synthetic parallel data with CRISS. So I wonder what is the best practice if I have a lot of parallel data, and want to induce a bilingual dictionary? Thank you in advance!

HuihuiChyan avatar Nov 03 '22 08:11 HuihuiChyan

If you have sufficient parallel data you can use them as input to our method for extracting entries. You could also use SimAlign directly. The difference is there will more noise in the SimAlign outputs (but higher recall), whereas our method optimizes for a higher precision dictionary on top of SimAlign outputs.

sidaw avatar Nov 04 '22 17:11 sidaw