sherpa-onnx Possible for batch extraction of speaker embeddings?

Possible for batch extraction of speaker embeddings?

Open chiiyeh opened this issue 1 year ago • 5 comments

trafficstars

Hi, would like to know if it is possible to modify the code to enable batch processing of the extraction of speaker embeddings?

Apr 17 '24 07:04 chiiyeh

Hi, would like to know if it is possible to modify the code to enable batch processing of the extraction of speaker embeddings?

I have never tried that. It may be possible but it requires changing the onnx export code.

I suggest that you first have a look at the models from 3dspeaker, wespeaker, and NeMo.

If batch processing is used, there should be some mask for paddings since waves in a batch may not have the same number of samples.

Apr 17 '24 07:04 csukuangfj

Thanks for the quick reply! I am actually looking at 3dspeaker code and they are not doing batch extraction but more of sequential extraction. Suppose I restrict all the samples in the batch to the same duration, will that still require changing the onnx export code?

Apr 17 '24 07:04 chiiyeh

Suppose I restrict all the samples in the batch to the same duration, will that still require changing the onnx export code?

In that case, I think the model already supports batch processing.

Please change https://github.com/k2-fsa/sherpa-onnx/blob/69440e481ff4ad3dc0ff8679128f6ee177c7b2d9/scripts/3dspeaker/test-onnx.py#L137 to

 )[0]

and you should get a 2-D tensor of shape (batch_size, embedding_dim).

I suggest that you play with https://github.com/k2-fsa/sherpa-onnx/blob/master/scripts/3dspeaker/test-onnx.py and check if it indeed works with batch processing.

By the way, we use https://github.com/k2-fsa/sherpa-onnx/blob/master/scripts/3dspeaker/export-onnx.py to export 3d-speaker models to onnx.

Apr 17 '24 07:04 csukuangfj

Okay! Thank you so much for the guidance! Really is a pleasure to use icefall and sherpa! Thanks for all the hardwork!

Apr 17 '24 07:04 chiiyeh

:smiley:

Apr 17 '24 07:04 csukuangfj

sherpa-onnx sherpa-onnx copied to clipboard

Possible for batch extraction of speaker embeddings?

sherpa-onnx
sherpa-onnx copied to clipboard