music-artist-classification-crnn
music-artist-classification-crnn copied to clipboard
Questions about some errors.
Hi, I followed your setup guide but I'm having trouble running the 'main.py' file getting the following errors. Any idea?
Hi Paxias,
It's been sometime since I looked at this but based on thaterror, it did not find the data in the expected folder (hence why the arrays were empty). Did you execute the first step under Usage in the README?
- Prepare mel-scaled spectrograms from raw audio in the dataset.
- Run src/utility.py if the dataset is stored using its original folder structure (artists/[artist]/[album]/[song].mp3) in the project root.
- Using the create_dataset() utility function in src/utility.py with a custom directory if the dataset is stored elsewhere.
The load dataset scripts are expecting to find the spectrograms in the 'song_data' folder. I should also add that the create_dataset() function or src/utility.py script are expecting data in the structure mentioned above at the project root: artists/[folders corresponding to artist names]/[folders corresponding to album names]/[song names.mp3]. This is the format the artist20 dataset is shared in.
Yeah, the dataset structure in my project is artists/[artist folders]/[album folders]/[songs].mp3. In fact if i try to change the name of the folder i get the directory not found error. But after running the utility.py i have no spectogram in the song_data folder.
Up, i still have problems on creating the spectrograms.
I'm following the dataset structure you suggested correctly (in the project folder: 'artists'/artists folders/artists albums/*.mp3).
Hi Paxias,
You should have some files created by the utility.py script in the song_data folder. If those aren't there, the data loading will fail. Have you tried manually running the create_dataset() function from the utilities? It should print the artist name as it iterates through the 'artists' folder and be creating files as it goes. You can find more info here
Hi Zain,
I was able to reproduce your experiment. However I would like to do a classification of the artist20 dataset using two parameters (Happy and Sad). I downloaded a small dataset already structured and classified as I wanted in order to train the neural network. How do I now classify the artist20 dataset using the Happy and Sad model? Where can I find the Happy and Sad model previously developed? Shouldn't it be saved somewhere?
Hi Paxias, glad that you were able to reproduce! The current version of the code uses the artist name to classify which it extracts from the folder structure: [artist]/[album]/[song].mp3
if you replace the artist above by happy/sad, e.g., happy/[album]/[song].mp3 sad/[album]/[song].mp3
The code should work as-is. However, I haven't tested this as I'm not familiar with the mood prediction problem. Once you've trained the model, it saves weights to a specified folder (by default 'weights'). You can load these in as per usual using the Keras API, but for an example see here (81-85, 115, 122)
Ok thank you for the infos! I'll keep you updated. So should i create basically two different folders and copy all the albums on both and then follow the instructions as I did at the beginning?
Hello and happy new year! I got back to work after a while but I didn't get any results. As I wrote earlier I would like to perform a classification of the artist20 dataset using a previously trained model based on two parameters (happy and sad). I saw that in utility.py (line 305) there is the predict_artist function. This is probably what it does for me, however it is not clear to me how to pass certain parameters to the function (e.g. X, Y, Z)
Hi @Paxias, apologies for the late response. I missed your return comment. I see that I didn't do a great job documenting the inputs of the prediction function (apologies again for this -- this was a relatively early work of mine), but if you're still interested:
predict artist takes in a model object (a trained model output from rest of the code), X which is an array of audio slices, Y which is an array of encoded ground truth labels, S is an array of strings of song names corresponding to each audio slice, le is the label encoder object (from scikit-learn) used to originally encode the target variable, class_names is a list of class names for printing purposes, slices denotes the max number of slices used for a given song, verbose denotes how much printing to perform, ml_mode only evaluates the model on samples it is confident on.
If you're just interested in making a generic prediction though, you can follow the instructions in representation.py skipping lines 60 and 61 which remove the classification layer.