essentia
essentia copied to clipboard
Problem connecting the Pipeline for DiscogsEffnet ('StreamingAlgo' object has no attribute 'poolIn')
Hello, I tried to use the same procedure like this https://github.com/palonso/mtg-general-meeting-03-2020-essentia-tensorflow/blob/master/demo-realtime-essentia-tensorflow.ipynb for the live implementation of the DiscogsEffnet by just swapping the ML-Model to the combination of the Discogs embedding model and header but stumbled on this problem while connecting the pipeline:
vimp = VectorInput(buffer)
fc = FrameCutter(frameSize=frameSize, hopSize=hopSize)
tim = TensorflowInputMusiCNN()
vtt = VectorRealToTensor(shape=[1, 1, patchSize, numberBands], lastPatchMode='discard')
ttp = TensorToPool(namespace=inputLayer)
# Embedding model
tfpED = TensorflowPredictEffnetDiscogs(graphFilename=embeddingModelName, output='PartitionedCall:1')
# Genre prediction model
model = TensorflowPredict2D(graphFilename=predictionModelName, input=inputLayer, output=outputLayer)```
vimp.data >> fc.signal
fc.frame >> tim.frame
tim.bands >> vtt.frame
tim.bands >> (pool, 'melbands')
vtt.tensor >> ttp.tensor
ttp.pool >> tfpED.poolIn
tfpED.poolOut >> (pool, 'embeddings')
embeddings_tensor = PoolToTensor(namespace='embeddings')
(pool, 'embeddings') >> embeddings_tensor.tensor
embeddings_tensor.tensor >> model.tensorIn
model.tensorOut >> (pool, outputLayer)
Which produced the error message: Cell In[43], line 7 5 tim.bands >> (pool, 'melbands') 6 vtt.tensor >> ttp.tensor ----> 7 ttp.pool >> tfpED.poolIn 8 tfpED.poolOut >> (pool, 'embeddings') 10 embeddings_tensor = PoolToTensor(namespace='embeddings')
AttributeError: 'StreamingAlgo' object has no attribute 'poolIn'
I searched the src on the github page and poolIn seems to appear under /src/algorithms/machinelearning/tensorflowpredicteffnetdiscogs.cpp, but in a already pre-built algorithm:
AlgorithmFactory& factory = AlgorithmFactory::instance();
_frameCutter = factory.create("FrameCutter");
_tensorflowInputMusiCNN = factory.create("TensorflowInputMusiCNN");
_vectorRealToTensor = factory.create("VectorRealToTensor");
_tensorToPool = factory.create("TensorToPool");
_tensorflowPredict = factory.create("TensorflowPredict");
_poolToTensor = factory.create("PoolToTensor");
_tensorToVectorReal = factory.create("TensorToVectorReal");
_tensorflowInputMusiCNN->output("bands").setBufferType(BufferUsage::forMultipleFrames);
_signal >> _frameCutter->input("signal");
_frameCutter->output("frame") >> _tensorflowInputMusiCNN->input("frame");
_tensorflowInputMusiCNN->output("bands") >> _vectorRealToTensor->input("frame");
_vectorRealToTensor->output("tensor") >> _tensorToPool->input("tensor");
_tensorToPool->output("pool") >> _tensorflowPredict->input("poolIn");
_tensorflowPredict->output("poolOut") >> _poolToTensor->input("pool");
_poolToTensor->output("tensor") >> _tensorToVectorReal->input("tensor");
attach(_tensorToVectorReal->output("frame"), _predictions);
_network = new scheduler::Network(_frameCutter);
}
So do i not have to build the pipeline and just connect the embedding & header? If yes how do I do that. Sorry if this is rather obvious aswell. Friendly Regards & I hope you had a good start into the new year :)
Hi @perli99, as you mention, the TensorflowPredict"Model" algorithms are wrappers containing all steps of the pipeline inside. This is how to make it work in streaming mode:
import numpy as np
from essentia.streaming import *
from essentia import Pool, run
# model parameters
inputLayerED = "serving_default_melspectrogram"
outputLayerED = "PartitionedCall:1"
inputLayer = "model/Placeholder"
outputLayer = "model/Softmax"
embeddingModelName = "discogs-effnet-bs64-1.pb"
predictionModelName = "danceability-discogs-effnet-1.pb"
# with the current configuration, we need > 64 seconds to make a prediction
sampleRate = 16000
buffer = np.zeros(sampleRate * 65, dtype="float32")
vimp = VectorInput(buffer)
# Embedding model
tfpED = TensorflowPredictEffnetDiscogs(
graphFilename=embeddingModelName,
input=inputLayerED,
output=outputLayerED,
)
model = TensorflowPredict2D(
graphFilename=predictionModelName,
input=inputLayer,
output=outputLayer,
dimensions=1280,
)
pool = Pool()
vimp.data >> tfpED.signal
tfpED.predictions >> model.features
model.predictions >> (pool, outputLayer)
run(vimp)
print(pool[outputLayer].shape)
The main problem to make EffnetDiscogs work in real-time is that, right now, we only have versions of the model requiring a fixed batch size of 64 (discogs-effnet-bs64-1.pb). This means that you need enough audio to generate 64 patches of ~2 seconds in order to get a prediction.
Please, let me know if the current model is enough for your application or if you would like to have a bs1
version, suitable for close-to-real-time operation.
Hey @palonso thank you for your fast reply.
So if i understand correctly this model needs 64 batches of ~2 secs, so ~128 seconds until it can make an prediction?
I would like to have a solution where the latency is ideally not much more than 1 second. For my Bachelor thesis I build a robot that "listens" to live music, extracts features and then paints a picture based on those features. I want one of those features to be the genre (also mood, energy...) and i figured the discogs Model would be nice for this, because if I combine some of the genres into broader categories (Rock,Jazz...) I would get a pretty good accuracy and and the live implementation as shown here https://www.youtube.com/watch?v=Cp0zkojT9RQ seemed to be close to real time.
So yes if the bs1 version is faster, than that one is probably the right one for me, where do i get that one? Or would you advise that i use one of the other models all together? I liked the EffnetDiscogs, because i would be able to use different headers and also the big number of the underlying Training Data seems nice.
Thank you for your help already, I really appreciate it Vincent
Sorry for forgetting about this!
We have uploaded a version of discogs-effnet that operates with batchSize=1
, suitable for low latency applications.
This is how to adapt the previous example for this case:
import os
os.environ["TF_CPP_MIN_LOG_LEVEL"] = "3"
import numpy as np
from essentia.streaming import *
from essentia import Pool, run
# model parameters
inputLayerED = "serving_default_melspectrogram"
outputLayerED = "PartitionedCall:1"
inputLayer = "model/Placeholder"
outputLayer = "model/Softmax"
embeddingModelName = "discogs-effnet-bs1-1.pb"
predictionModelName = "danceability-discogs-effnet-1.pb"
sampleRate = 16000
buffer = np.zeros(sampleRate * 3, dtype="float32")
vimp = VectorInput(buffer)
# Embedding model
tfpED = TensorflowPredictEffnetDiscogs(
graphFilename=embeddingModelName,
input=inputLayerED,
output=outputLayerED,
batchSize=1,
)
model = TensorflowPredict2D(
graphFilename=predictionModelName,
input=inputLayer,
output=outputLayer,
dimensions=1280,
)
pool = Pool()
vimp.data >> tfpED.signal
tfpED.predictions >> model.features
model.predictions >> (pool, outputLayer)
run(vimp)
print(pool[outputLayer].shape)