project-ideas icon indicating copy to clipboard operation
project-ideas copied to clipboard

Turaco: The first LLM that speaks native pidgin english fluently

Open FotieMConstant opened this issue 1 year ago • 4 comments

Introduction

A large language model fine-tuned to fluently speak and understand native Pidgin English for natural communication across Africa.

Description

Turaco is the first LLM developed specifically to handle conversational pidgin english, focussing on natural interactions and everyday communication(haven't seen an AI that speaks good pidgin english, openai's chatgpt sucks at this). Built on Meta's LLAMA 3.1 base model, it aims to bridge the language gap by allowing users to engage with an AI that understands and responds in pidgin english, which is widely spoken in countries like Cameroon and Nigeria. The project is created to provide an accessible tool for education, communication, and cultural preservation, making pidgin speakers feel represented in the AI space. The model was trained using curated datasets collected from various online sources, ensuring that it grasps the unique nuances of Pidgin.

what is done so far can be found here: https://github.com/FotieMConstant/turaco

Relevant Technology

Relevant Technology: Language: Python Platform: Hugging Face Transformers, Google Colab A100 for fine-tuning(can also work on the free T4) Model Base: LLAMA 3.1-8B (Meta's large language model) Libraries/Frameworks: Hugging Face transformers for model implementation hf datasets library for data handling PyTorch as the underlying deep learning framework

Complexity

  • [X] Beginner - This project requires no or little prior knowledge of the technolog(y|ies) specified to contribute to the project
  • [X] Intermediate - The user should have some prior knowledge of the technolog(y|ies) to the point where they know how to use it, but not necessarily all the nooks and crannies of the technology
  • [ ] Advanced - The project requires the user to have a good understanding of all components of the project to contribute

Required time

  • [ ] Little work - A couple of days
  • [ ] Medium work - A week or two
  • [X] Much work - The project will take more than a couple of weeks and serious planning is required

Categories

  • [ ] Mobile app
  • [ ] IoT
  • [ ] Web app
  • [ ] Frontend/UI
  • [X] AI/ML
  • [X] APIs/Backend
  • [X] Voice Assistant
  • [ ] Developer Tooling
  • [ ] Extension/Plugin/Add-On
  • [ ] Design/UX
  • [ ] AR/VR
  • [X] Bots
  • [ ] Security
  • [ ] Blockchain
  • [ ] Futuristic Tech/Something Unique

FotieMConstant avatar Sep 14 '24 19:09 FotieMConstant

It's great having you contribute to this project

Welcome to the community :nerd_face:, we will carefully review your project idea and get back to you.

If you would like to follow our community's work you should join us on our Telegram chat group and Channel, we help and encourage each other to contribute to open source.
You can also support us financially here to help us build Cameroon one open source at a time.

github-actions[bot] avatar Sep 14 '24 19:09 github-actions[bot]

@FotieMConstant Tu t'appuierais sur quel modèle comme modèle fondation et d'ou tirerais-tu tes exemples ?

billmetangmo avatar Oct 01 '24 09:10 billmetangmo

Llama3.1 8b as base but since the release of 3.2 1b and 3b I’m thinking to have those as foundation models especially since they are optimized for edge devices

FotieMConstant avatar Oct 01 '24 10:10 FotieMConstant

Hi @FotieMConstant, the project looks good! But I have an issue with the term "First". Can you justify ?

pythonbrad avatar Oct 06 '24 12:10 pythonbrad