sherpa-onnx
sherpa-onnx copied to clipboard
Possible for batch extraction of speaker embeddings?
Hi, would like to know if it is possible to modify the code to enable batch processing of the extraction of speaker embeddings?
Hi, would like to know if it is possible to modify the code to enable batch processing of the extraction of speaker embeddings?
I have never tried that. It may be possible but it requires changing the onnx export code.
I suggest that you first have a look at the models from 3dspeaker, wespeaker, and NeMo.
If batch processing is used, there should be some mask for paddings since waves in a batch may not have the same number of samples.
Thanks for the quick reply! I am actually looking at 3dspeaker code and they are not doing batch extraction but more of sequential extraction. Suppose I restrict all the samples in the batch to the same duration, will that still require changing the onnx export code?
Suppose I restrict all the samples in the batch to the same duration, will that still require changing the onnx export code?
In that case, I think the model already supports batch processing.
Please change https://github.com/k2-fsa/sherpa-onnx/blob/69440e481ff4ad3dc0ff8679128f6ee177c7b2d9/scripts/3dspeaker/test-onnx.py#L137 to
)[0]
and you should get a 2-D tensor of shape (batch_size, embedding_dim).
I suggest that you play with https://github.com/k2-fsa/sherpa-onnx/blob/master/scripts/3dspeaker/test-onnx.py and check if it indeed works with batch processing.
By the way, we use https://github.com/k2-fsa/sherpa-onnx/blob/master/scripts/3dspeaker/export-onnx.py to export 3d-speaker models to onnx.
Okay! Thank you so much for the guidance! Really is a pleasure to use icefall and sherpa! Thanks for all the hardwork!
:smiley: