fairseq icon indicating copy to clipboard operation
fairseq copied to clipboard

Training Data SIZE of NLLB-200

Open luckysusanto opened this issue 2 years ago • 0 comments

Hello! Sorry for the basic question.

I've been looking for the breakdown for size of NLLB-200 training data and couldn't found it. The paper states that NLLB-200 was trained using 3.6B sentences from low resource languages and 40.1B sentences form high resource languages. But, I wasn't able to find the breakdown of this for each language.

Can someone tell me where to find this breakdown? Many thanks in advance!

luckysusanto avatar May 04 '23 14:05 luckysusanto