kaldi-serve GPU batch decoding + Online request queueing machanism

GPU batch decoding + Online request queueing machanism

Open greed2411 opened this issue 3 years ago • 1 comments

possibly along with a request queueing mechanism like ServiceStreamer for online

Oct 22 '21 03:10 greed2411

Task 1

Write an interface and implement GPU batch decoding for Kaldi ASR models in the kaldi-serve core C++ library.

The current partial version (gpu-decoder branch) is buggy (stale issue here), which you may use as a starting point or write one from scratch, it's upto you. The main idea here is to be able to pass a custom async callback to the batch decoding pipeline that accepts the final result once the GPU compute task is complete.

Relevant links:

Batched Decoding binary
Batched Threaded CUDA Pipeline - Source

Task 2

Implement an online request queueing mechanism similar to that of ServiceStreamer that utilizes the GPU Batch Decoding interface (Task 1) to reduce latency in the kaldi-serve gRPC server application during higher loads.

Nov 17 '21 20:11 pskrunner14

kaldi-serve kaldi-serve copied to clipboard

GPU batch decoding + Online request queueing machanism

Task 1

Task 2

kaldi-serve
kaldi-serve copied to clipboard