k2 Batch parallel CPU decoding.

With CPU inferencing I can leverage the Pytorch number of threads for conformer inference by simply adding torch.set_num_threads(desired_num) and it does the thing, I'm observing almost linear speedup of the conformer inference with the increasing number of threads.

Now during HLG decoding, It seems always perform single-threaded decoding and there is no way to change this. I wonder if it is possible to implement parallelization over utterances in the batch. It sounds like a natural way to parallelize this task and by doing it on c++ level one can avoid any Python overhead and keep the main python pipeline simple "single-threaded" inference. I know that in Sherpa you prefer to handle threads in Python and release GIL when it's necessary, but for my use case, it would be nice to have the option to execute _k2.intersect_dense_pruned in parallel, providing the number of threads as a parameter.

Will it be difficult to implement?

Jan 05 '23 23:01 videodanchik

Ok, I look through the code and I believe the k2.OnlineDenseIntersecter should do the thing, it has a parameter num_streams which as far as I understand should be the number of concurrent streams that are equal to batch size.

I tried this parameter, but unfortunately don't observe any speed up, the decoding process is still single-threaded. Am I doing anything wrong?

In general, I just want to solve the following task: I have log_probs of (batch_size, seq_len, num_bpe) obtained from the neural net, and all I want is to perform concurrent-threads decoding, making parallelization over the samples in the batch. @csukuangfj @pkufool Can you please advise me on how to do it properly?

Jan 09 '23 16:01 videodanchik

k2 is not optimized for CPU. Can you start as many threads as num_batch to process the data?

Note: if you are using Python, you may need to change the python binding code in k2 to release the GIL of Python.

Jan 10 '23 00:01 csukuangfj

k2 k2 copied to clipboard

Batch parallel CPU decoding.

k2
k2 copied to clipboard