essentia-replicate-demos icon indicating copy to clipboard operation
essentia-replicate-demos copied to clipboard

Google colab notebooks for the demos ?

Open timtensor opened this issue 2 years ago • 8 comments
trafficstars

Hi , I am currently looking into higher level feature extraction from an audio signal such as genre, mood ,danceablity as a colab / jupyter notebook. Is there an example of it that one can refer to and try it ?

timtensor avatar Mar 01 '23 21:03 timtensor

Hi @timtensor, you can have a look at Essentia models. It contains feature extraction example scripts for all our models.

palonso avatar Mar 02 '23 07:03 palonso

Thanks for pointing it out. I think there is problem with installation of essentia-tensorflow I get the following error

I did the installation using pypi - !pip install essentia-tensorflow while the pip version is pip 22.0.4 from /usr/local/lib/python3.8/dist-packages/pip (python 3.8)

timtensor avatar Mar 02 '23 19:03 timtensor

I think you missed the error message. Could you also mention your OS?

palonso avatar Mar 02 '23 21:03 palonso

Sorry for the incomplete information. The following is the error message . I am running it in google colab so i guess its ubuntu based


<ipython-input-38-96cbcf823c6c> in <module>
----> 1 from essentia.standard import MonoLoader, TensorflowPredictMusiCNN

ImportError: cannot import name 'TensorflowPredictMusiCNN' from 'essentia.standard' (/usr/local/lib/python3.8/dist-packages/essentia/standard.py)

---------------------------------------------------------------------------
NOTE: If your import is failing due to a missing package, you can
manually install dependencies using either !pip or !apt.

To view examples of installing some common dependencies, click the
"Open Examples" button below.

pmahan00 avatar Mar 03 '23 10:03 pmahan00

Just an update, it seems work on google colab when i have the following

!apt-get update
!apt-get install -y python3-dev libsndfile1-dev
!pip install essentia==2.1b6.dev374 librosa==0.8.1
!pip install essentia-tensorflow

I have two questions on the prediction model a) Is it not possible to load the pre trained model from google drive . I mounted my drive and tried to point the graph file name as such /mnt/gdrive/xxxx but it resulted in an error b) I am bit confused about the outcome ? from the embeddings i get a matrix of values but is there a decoding step as well ?

Sample code run on google colab

!apt-get update
!apt-get install -y python3-dev libsndfile1-dev
!pip install essentia==2.1b6.dev374 librosa==0.8.1
!pip install essentia-tensorflow

from essentia.standard import MonoLoader, TensorflowPredictEffnetDiscogs

audio = MonoLoader(filename=audioFile, sampleRate=16000)()
model = TensorflowPredictMusiCNN(graphFilename="msd-musicnn-1.pb",output = "model/dense/BiasAdd")
predictions = model(audio)
print(predictions)

Perhaps i am doing something wrong in the code ?

pmahan00 avatar Mar 03 '23 13:03 pmahan00

Glad to see that you could install and use the models!

regarding a), it is not related to Essentia, so I'd recommend to look for help somewhere else. Alternatively, you could directly download the models in the Colab, e.g., adding !curl -SLO https://essentia.upf.edu/models/autotagging/msd/msd-musicnn-1.pb to your script.

about b), you are right, the embeddings are not human-readable and need to be input to a classification head to get the class probabilities. Note that clicking on each model from the web you will get the example script to get the predictions and links to the model weights, and metadata file. For example, this is the script to do inference with the danceability-msd-musicnn model on top of the embeddings you already extracted:

from essentia.standard import MonoLoader, TensorflowPredictMusiCNN, TensorflowPredict2D

audio = MonoLoader(filename="audio.wav", sampleRate=16000)()
embedding_model = TensorflowPredictMusiCNN(graphFilename="msd-musicnn-1.pb", output="model/dense/BiasAdd")
embeddings = embedding_model(audio)

model = TensorflowPredict2D(graphFilename="danceability-msd-musicnn-1.pb", output="model/Softmax")
predictions = model(embeddings)

predictions will be a matrix [time_stamp, n_classes] because this model makes a prediction each 1.5 seconds of audio. To get track-level predictions, you can average the matrix across the time axis.

palonso avatar Mar 03 '23 14:03 palonso

Thanks for the curl tip . I totally had forgotten about it . I guess all the models are under here https://essentia.upf.edu/models/

I didnt quite understand the human readable , explanation on track level. For example i was looking into a track level classification of pre-trained SVM Gaia models to learn about it. Is there a python code example that can help me to get classification based on SVM model or a code snippet to experiment with . Model link :https://essentia.upf.edu/svm_models/

pmahan00 avatar Mar 03 '23 15:03 pmahan00

Hi @pmahan00.

To get overall track predictions, you can simply average the resulting matrix of activations across time similar to this example.

Note that SVM classifiers are outdated in terms of their accuracy and generalization, and we recommend using the new models instead.

dbogdanov avatar Mar 06 '23 12:03 dbogdanov