DL-Simplified icon indicating copy to clipboard operation
DL-Simplified copied to clipboard

Language Detection

Open yashhibare7 opened this issue 1 year ago • 13 comments

Deep Learning Simplified Repository (Proposing new issue)

:red_circle: Title :Language Detection :red_circle: Dataset :kaggle :red_circle: Approach : To detect the language of a word or sentence in Python, you can follow these steps: 1. Preprocess the input by removing punctuation and converting it to lowercase. 2. Tokenize the input into words or characters. 3. Use a language detection library like langdetect or textblob to identify the language based on statistical models.


📍 Follow the Guidelines to Contribute in the Project :

  • You need to create a separate folder named as the Project Title.
  • Inside that folder, there will be four main components.
    • Images - To store the required images.
    • Dataset - To store the dataset or, information/source about the dataset.
    • Model - To store the machine learning model you've created using the dataset.
    • requirements.txt - This file will contain the required packages/libraries to run the project in other machines.
  • Inside the Model folder, the README.md file must be filled up properly, with proper visualizations and conclusions.

:red_circle::yellow_circle: Points to Note :

  • The issues will be assigned on a first come first serve basis, 1 Issue == 1 PR.
  • "Issue Title" and "PR Title should be the same. Include issue number along with it.
  • Follow Contributing Guidelines & Code of Conduct before start Contributing.

:white_check_mark: To be Mentioned while taking the issue :

  • Full name : Yash Hibare
  • GitHub Profile Link : https://github.com/yashhibare7
  • Email ID :[email protected]
  • Participant ID (if applicable):
  • Approach for this Project :
  • What is your participant role? (Mention the Open Source program)

Happy Contributing 🚀

All the best. Enjoy your open source journey ahead. 😎

yashhibare7 avatar Jul 06 '23 09:07 yashhibare7

Please mention the all the details about the project @yashhibare7

abhisheks008 avatar Jul 06 '23 15:07 abhisheks008

Details mentioned

yashhibare7 avatar Jul 08 '23 18:07 yashhibare7

This is a deep learning project repository, we expect contributors will come up deep learning methods to solve the problem statements. Please modify your approach and come up again with the new approach by including deep learning methods in it. @yashhibare7

abhisheks008 avatar Jul 09 '23 04:07 abhisheks008

Please assign me @abhisheks008

ShatilKhan avatar Nov 05 '23 12:11 ShatilKhan

Can you please share your approach on how will you solve this issue, what are the models you will use? @ShatilKhan

abhisheks008 avatar Nov 05 '23 13:11 abhisheks008

Full name : Soumik Banerjee GitHub Profile Link : https://github.com/Soumiksb06 Email ID: [email protected] Approach: I will use langid library for faster language detection. And also I'll try and research about other libraries available and choose the best one.

Hi, Abhishek, I'm completely new to Open Source but have lots of experience in building DL and ML models for prediction and I've also worked with Speech detection, Emotion detection before. I feel that this project would be a suitable start for my Open Source journey. Please assign this one to me. I'm contributor of SWOC 2024. Already completed and merged one issue! Kindly assign this to me. Thank You!

Soumiksb06 avatar Dec 29 '23 14:12 Soumiksb06

Let the program start officially. Issues will be assigned after that. Till then go through the repository as well as the README file.

abhisheks008 avatar Dec 29 '23 16:12 abhisheks008

Full name : Aditya Kumar Singh GitHub Profile Link : https://github.com/Axikop Email ID :[email protected] Participant ID (if applicable): Approach for this Project :To create a robust deep learning model for this i will choose a suitable dataset from kaggle that is in multiple languages and then I will preprocess it to remove the noises , converting to lowercase and handling links etc. I will also tokenize it and applying padding that will make sure that the neural network will get consistent input length. Now for the model architecture i will be using a Recurrent Neural Network because they excel at capturing the sequential nature of language, understanding how words relate to each other in a sentence and preserving context. What is your participant role? Social Winter of code 2024

Axikop avatar Jan 03 '24 14:01 Axikop

Please assign me this issue @abhisheks008

Axikop avatar Jan 03 '24 14:01 Axikop

please reply @abhisheks008

Axikop avatar Jan 05 '24 16:01 Axikop

Use at least 2-3 deep learning models/methods for this project for developing the models and compare them based on the accuracy scores to find out the best fitted model. Issue assigned to you @Axikop

abhisheks008 avatar Jan 06 '24 03:01 abhisheks008

Full name : Yash Sachan GitHub Profile Link : https://github.com/YashSachan2 Email ID: [email protected] Approach: I will use the kaggle language detection dataset(https://www.kaggle.com/datasets/basilb2s/language-detection) having 17 languages and then peform data preprocessing using nltk library and then perform tokenization and vectorisation and then train it by fine tuning pre trained models is BERT,distillbert,pre trained models from huggingface like LLAma. What is your participant role? GSSoc'24 Please assign me this issue @abhisheks008

YashSachan2 avatar May 14 '24 22:05 YashSachan2

Hi @YashSachan2 nice to have you here again! You can start working in this issue. Assigned to you.

abhisheks008 avatar May 15 '24 04:05 abhisheks008