fairseq
fairseq copied to clipboard
Include Kirundi (Rundi - run_Latn) language in the No Language Left Behind (NLLB) project
🚀 Feature Request
Include Kirundi (Rundi) language in the No Language Left Behind (NLLB) project
Motivation
As researchers and developers from Burundi, we are deeply concerned about the absence of Kirundi (also known as Rundi) in the No Language Left Behind (NLLB) project. Kirundi is the national language of Burundi and is spoken by over 11 million people worldwide, including large diaspora communities. Its exclusion from NLLB significantly hampers our ability to develop inclusive, localized solutions for Kirundi speakers globally.
The lack of Kirundi in NLLB creates several critical issues:
-
Limited access to information: Kirundi speakers struggle to access vital information in their native language, especially in areas like health, education, and technology.
-
Hindered software development: Local developers face significant challenges in creating applications and services tailored to Kirundi speakers, limiting innovation and economic growth in our region.
-
Digital divide: The absence of Kirundi in major language models like NLLB widens the digital divide, leaving our community behind in the rapidly advancing world of AI and natural language processing.
-
Cultural preservation: Without proper representation in language models, there's a risk of Kirundi losing its digital presence, potentially impacting its long-term preservation and evolution.
Pitch
We propose the inclusion of Kirundi in the NLLB project. This addition would:
-
Enable accurate translation to and from Kirundi, facilitating better communication and information exchange for millions of speakers.
-
Empower local developers to create more sophisticated, language-specific applications and services.
-
Enhance natural language understanding capabilities for Kirundi, opening doors for advanced AI applications in areas such as voice recognition, text-to-speech, and sentiment analysis.
-
Contribute to the digital preservation of Kirundi, ensuring its relevance in the digital age.
-
Align with NLLB's mission of language inclusivity and bridging the gap for underrepresented languages.
Alternatives
While alternatives are limited, some developers have attempted to:
-
Use closely related languages like Kinyarwanda as a proxy, but this leads to inaccuracies and doesn't fully capture the nuances of Kirundi.
-
Develop smaller, less efficient language models specifically for Kirundi, but these lack the resources and scale of NLLB.
-
Rely on human translation, which is time-consuming, expensive, and not scalable for large-scale applications.
These alternatives are insufficient and emphasize the need for Kirundi's inclusion in a comprehensive project like NLLB.
Additional context
Kirundi is not just a language; it's a carrier of our culture, history, and identity. Its inclusion in NLLB would be a significant step towards digital equity and would open up numerous opportunities for innovation and development in Burundi and for Kirundi speakers worldwide.
We have a growing tech community eager to leverage advanced language models. The inclusion of Kirundi in NLLB would catalyze numerous projects and potentially transform our digital landscape.
Furthermore, Burundi's unique linguistic situation, with Kirundi as the primary language alongside French and English, presents an interesting case study for multilingual societies and could provide valuable data for improving NLLB's capabilities in similar contexts.
We are ready and willing to collaborate in any way possible to facilitate this inclusion, including providing language data, expert knowledge, and testing support.