Turaco: The first LLM that speaks native pidgin english fluently
Introduction
A large language model fine-tuned to fluently speak and understand native Pidgin English for natural communication across Africa.
Description
Turaco is the first LLM developed specifically to handle conversational pidgin english, focussing on natural interactions and everyday communication(haven't seen an AI that speaks good pidgin english, openai's chatgpt sucks at this). Built on Meta's LLAMA 3.1 base model, it aims to bridge the language gap by allowing users to engage with an AI that understands and responds in pidgin english, which is widely spoken in countries like Cameroon and Nigeria. The project is created to provide an accessible tool for education, communication, and cultural preservation, making pidgin speakers feel represented in the AI space. The model was trained using curated datasets collected from various online sources, ensuring that it grasps the unique nuances of Pidgin.
what is done so far can be found here: https://github.com/FotieMConstant/turaco
Relevant Technology
Relevant Technology: Language: Python Platform: Hugging Face Transformers, Google Colab A100 for fine-tuning(can also work on the free T4) Model Base: LLAMA 3.1-8B (Meta's large language model) Libraries/Frameworks: Hugging Face transformers for model implementation hf datasets library for data handling PyTorch as the underlying deep learning framework
Complexity
- [X] Beginner - This project requires no or little prior knowledge of the technolog(y|ies) specified to contribute to the project
- [X] Intermediate - The user should have some prior knowledge of the technolog(y|ies) to the point where they know how to use it, but not necessarily all the nooks and crannies of the technology
- [ ] Advanced - The project requires the user to have a good understanding of all components of the project to contribute
Required time
- [ ] Little work - A couple of days
- [ ] Medium work - A week or two
- [X] Much work - The project will take more than a couple of weeks and serious planning is required
Categories
- [ ] Mobile app
- [ ] IoT
- [ ] Web app
- [ ] Frontend/UI
- [X] AI/ML
- [X] APIs/Backend
- [X] Voice Assistant
- [ ] Developer Tooling
- [ ] Extension/Plugin/Add-On
- [ ] Design/UX
- [ ] AR/VR
- [X] Bots
- [ ] Security
- [ ] Blockchain
- [ ] Futuristic Tech/Something Unique
It's great having you contribute to this project
Welcome to the community :nerd_face:, we will carefully review your project idea and get back to you.If you would like to follow our community's work you should join us on our Telegram chat group and Channel, we help and encourage each other to contribute to open source.
You can also support us financially here to help us build Cameroon one open source at a time.
@FotieMConstant Tu t'appuierais sur quel modèle comme modèle fondation et d'ou tirerais-tu tes exemples ?
Llama3.1 8b as base but since the release of 3.2 1b and 3b I’m thinking to have those as foundation models especially since they are optimized for edge devices
Hi @FotieMConstant, the project looks good! But I have an issue with the term "First". Can you justify ?