DL-Simplified
DL-Simplified copied to clipboard
Emotion Recognition from Audio using Deep Learning
Deep Learning Simplified Repository (Proposing new issue)
:red_circle: Project Title : Emotion Recognition from Audio using Deep Learning :red_circle: Aim : To build a deep learning model that can analyze audio recordings and classify the emotions expressed. This can have applications in areas such as customer service, mental health monitoring, and entertainment. :red_circle: Dataset : Various publicly available datasets for emotion recognition in audio, such as RAVDESS, TESS, CREMA-D, etc. :red_circle: Approach : Try to use 3-4 algorithms to implement the models and compare all the algorithms to find out the best fitted algorithm for the model by checking the accuracy scores. Also do not forget to do a exploratory data analysis before creating any model.
📍 Follow the Guidelines to Contribute in the Project :
- You need to create a separate folder named as the Project Title.
- Inside that folder, there will be four main components.
- Images - To store the required images.
- Dataset - To store the dataset or, information/source about the dataset.
- Model - To store the machine learning model you've created using the dataset.
requirements.txt- This file will contain the required packages/libraries to run the project in other machines.
- Inside the
Modelfolder, theREADME.mdfile must be filled up properly, with proper visualizations and conclusions.
:red_circle::yellow_circle: Points to Note :
- The issues will be assigned on a first come first serve basis, 1 Issue == 1 PR.
- "Issue Title" and "PR Title should be the same. Include issue number along with it.
- Follow Contributing Guidelines & Code of Conduct before start Contributing.
:white_check_mark: To be Mentioned while taking the issue :
- Full name : Chethana Potukanam
- GitHub Profile Link : https://github.com/ChethanaPotukanam
- Email ID : [email protected]
- Participant ID (if applicable):
- Approach for this Project : Load the Dataset Exploratory Data Analysis (EDA): Visualise common patterns and features in audio signals. Feature Extraction: Extract features such as MFCC, Chroma, Mel Spectrogram, etc. Model Implementation: Convolutional Neural Network (CNN) , Recurrent Neural Network (RNN) , Long Short-Term , Memory (LSTM) , Bidirectional LSTM (BiLSTM) Train and Evaluate Each Model Compare Performance using accuracy and loss metrics.
- What is your participant role? (Mention the Open Source program) GSSoC24
Happy Contributing 🚀
All the best. Enjoy your open source journey ahead. 😎
Thank you for creating this issue! We'll look into it as soon as possible. Your contributions are highly appreciated! 😊
Assigned @ChethanaPotukanam
can i work on this ?
can i work on this ?
Please share your approach.
@abhisheks008 could you please assign me this issue?
my approach is as follows:-
using the ravdess dataset for emotional speech audio Feature Engineering : convert audio into Mel spectrogram format
-
Using CNN to classify the audio according to emotions ( using VGG-16 architecture and ResNet50 as well)
-
using CNN to extract features from the spectrogram and then apply LSTM / Bi-LSTM on the encodings.
-
using HuggingFace speech2text and then using spacy universal sentence encoder to convert the resulting text into encoding vectors which can be predicted using an ANN.
this will be followed by evaluating the model using metrics and visualizing heatmaps of confusion matrices to analyse the error distribution
name : Moksh patel GitHub profile link : https://github.com/T3CH-Pyth0n event : Kharagpur Winter of Code ( KWoC)
Hi @T3CH-Pyth0n sorry for replying late. Assigning this issue to you.
@abhisheks008 ill be altering the approach a bit, but I'll still implement 3-4 models. does that work?
@abhisheks008 ill be altering the approach a bit, but I'll still implement 3-4 models. does that work?
Yes that'll work.
@abhisheks008 can i work on this?