masakhane-mt icon indicating copy to clipboard operation
masakhane-mt copied to clipboard

Fula Pulaar <-> English Resource (Sentence Pairs)

Open nikisix opened this issue 3 years ago • 2 comments

Hi I would like to contribute a Pulaar translator model, but need pointed to the the sentence pairs. Can anyone help me out?

nikisix avatar Mar 23 '21 21:03 nikisix

Hi @nikisix ! It looks like JW300 which we used as source for other languages does not include Pulaar. On the OPUS website you can look for other corpora: https://opus.nlpl.eu/ -- It lists CCAligned, Wikimedia, Ubuntu, QED for Fula, but I'm not sure if it's Pulaar. The CCAligned corpus was previously found (https://arxiv.org/abs/2103.12028) to contain mostly noise for Fula, so I would not recommend using it. Perhaps Wikimedia, Ubuntu or QED? These might be quite domain-specific though.

juliakreutzer avatar Mar 26 '21 17:03 juliakreutzer

Haven't used those last sources you mention before. I did notice JW300 has code 'fub' for pular defined, but no supporting data files unfortunately.

nikisix avatar Mar 30 '21 20:03 nikisix