essentia
essentia copied to clipboard
Processing batches of audio files through Essentia-Tensorflow pre-trained models
First of all thanks to the contributors of this library!
I'm currently trying to batch create embeddings from the AudioSet-VGGish pre-trained model
Am able to follow the docs to download the pretrained model and generate embeddings.
from essentia.standard import MonoLoader, TensorflowPredictVGGish
audio = MonoLoader(filename="audio.wav", sampleRate=16000, resampleQuality=4)()
model = TensorflowPredictVGGish(graphFilename="audioset-vggish-3.pb", output="model/vggish/embeddings")
embeddings = model(audio)
The problem is the examples don't show any implementation for batch_processing of multiple audio files. When I chucked the below code in for loop, it reinitializes tensorflow and runs really slow each iteration of the loop e.g
from essentia.standard import MonoLoader, TensorflowPredictVGGish
audio_paths = ["file1.wav", "file2.wav"]
for audio in audio_paths:
audio = MonoLoader(filename="audio.wav", sampleRate=16000, resampleQuality=4)()
model = TensorflowPredictVGGish(graphFilename="audioset-vggish-3.pb", output="model/vggish/embeddings")
embeddings = model(audio)
I've tried it like this and it does the same thing, is there a way to process audio in batches or stop tensorflow from reinitializing each run?
Yes, you can initialize MonoLoader
and TensorflowPredictVGGish
outside the inference loop:
from essentia.standard import MonoLoader, TensorflowPredictVGGish
audio_paths = ["file1.wav", "file2.wav"]
loader = MonoLoader()
model = TensorflowPredictVGGish(graphFilename="audioset-vggish-3.pb", output="model/vggish/embeddings")
for audio in audio_paths:
loader.configure(filename=audio, sampleRate=16000, resampleQuality=4)
audio = loader()
embeddings = model(audio)
Yes, you can initialize
MonoLoader
andTensorflowPredictVGGish
outside the inference loop:from essentia.standard import MonoLoader, TensorflowPredictVGGish audio_paths = ["file1.wav", "file2.wav"] loader = MonoLoader() model = TensorflowPredictVGGish(graphFilename="audioset-vggish-3.pb", output="model/vggish/embeddings") for audio in audio_paths: audio = loader.configure(filename=audio, sampleRate=16000, resampleQuality=4) embeddings = model(audio)
loader = MonoLoader()
print(loader)
returns TypeError: __str__ returned non-string (type NoneType)
.
It seems loader.configure()
is not behaving well, it always returns None
, also in your code above.
that's the expected return value for configure.
Ok got it, but I still don't understand how this could work out...
sorry @Galvo87! It was a mistake in my example script. I've updated the script and double-checked that it works.
The loader had to be configured first and then called.
@burstMembrane, did you find a good solution for batch processing? I have 8 GPUs and want to extract a bunch of embeddings as quickly as possible
I noticed the "batch_size" argument, but it seems like that has to do with how many "patches" it will process from the input audio file, rather than an option to batch-process multiple audio files.
Any tips appreciated.
The simplest approach would be to modify this script to receive a list of files to process with something like argparse
.
import argparse from essentia.standard import MonoLoader, TensorflowPredictVGGish def main(audio_paths): loader = MonoLoader() model = TensorflowPredictVGGish(graphFilename="audioset-vggish-3.pb", output="model/vggish/embeddings") for audio in audio_paths: loader.configure(filename=audio, sampleRate=16000, resampleQuality=4) audio = loader() embeddings = model(audio) # save the embeddings ... if __name__ == "__main__": parser = argparse.ArgumentParser(description="Process audio files using VGGish model") parser.add_argument("audio_files", nargs="+", help="List of audio files to process") args = parser.parse_args() main(args.audio_files) )
then you can divide the filelist you want to process in 8 chunks, (e.g., split -n l/8 -d filelist filelist_part
)
Finally you can launch one script per GPU:
CUDA_VISIBLE_DEVICES=0 python extract_embeddings.py $(< filelist_part00)
...
CUDA_VISIBLE_DEVICES=7 python extract_embeddings.py $(< filelist_part07)
Thanks, yes, I actually realized there was something similar I could do, in just chunking my data into my GPU-count chunks (8) and having a separate serial process for each GPU. Works well. (I also used batchSize=-1
, which think helps optimize a bit, though I'm not totally sure about that one.)