MSCTD icon indicating copy to clipboard operation
MSCTD copied to clipboard

About usage of chinese_text.txt.enzh in _processed_data and translation mismatch to the example image

Open p1k0pan opened this issue 1 year ago • 2 comments

Hi,

I would like to ask why there are chinese_test.txt and also chinese_text.txt.enzh, chinese_text.txt.enzh seems to be extracted only even number of indices from chinese_text.txt? Which one should I used for testing?

Besides, the example image translate "hold our course" to 保持我们的航向, but in chinese_text.txt translate it to 坚持我们的课程. Is that a mistake or I am finding a wrong file?

Thank you for your attention.

p1k0pan avatar Dec 24 '24 11:12 p1k0pan

Using chinese_text.txt.enzh for testing. The even number of indices from chinese_text.txt is for en->zh direction and the odd number of indices from chinese_text.txt is for zh->en direction, both of which consist of a complete dialogue. We may upload the wrong version before human checking. Thanks for your attention.

XL2248 avatar Dec 25 '24 03:12 XL2248

Using chinese_text.txt.enzh for testing. The even number of indices from chinese_text.txt is for en->zh direction and the odd number of indices from chinese_text.txt is for zh->en direction, both of which consist of a complete dialogue. We may upload the wrong version before human checking. Thanks for your attention.

Why split this test file in two direction instead of the whole file for testing?

p1k0pan avatar Dec 26 '24 14:12 p1k0pan