fairseq
fairseq copied to clipboard
Training Data SIZE of NLLB-200
Hello! Sorry for the basic question.
I've been looking for the breakdown for size of NLLB-200 training data and couldn't found it. The paper states that NLLB-200 was trained using 3.6B sentences from low resource languages and 40.1B sentences form high resource languages. But, I wasn't able to find the breakdown of this for each language.
Can someone tell me where to find this breakdown? Many thanks in advance!