seamless_communication icon indicating copy to clipboard operation
seamless_communication copied to clipboard

Add support for Sinhala language

Open tskamaldeep opened this issue 2 years ago • 4 comments

Hello,

I have a customer who wants support for Sinhala ASAP. Does Seamless_m4t support Sinhala? If not, can it be put on the roster as a TODO? What timelines am I looking at for demoing and using this? Kindly revert back to me.

tskamaldeep avatar Oct 19 '23 11:10 tskamaldeep

Oh, also support for translation of Sinhala to Tamil and back. Thanks.

tskamaldeep avatar Oct 19 '23 11:10 tskamaldeep

The current Seamless family supports Sinhala only with the SeamlessM4T-Medium model, and only in the text modality (see https://github.com/facebookresearch/seamless_communication/tree/main/docs/m4t for more details, and model cards https://github.com/facebookresearch/seamless_communication/tree/main/src/seamless_communication/cards for the lists of languages). Also, Sinhala is supported by the NLLB models.

Currently, there are not plans to extend SeamlessM4T to new languages. However, if you want to translate Sinhala speech, you could train your own Sinhala SONAR encoder and contribute it to https://github.com/facebookresearch/SONAR; if you make it compatible with the SONAR space, you will be able to use the existing SONAR text decoder to translate it into 200 languages.

avidale avatar Mar 14 '24 14:03 avidale