kaldi-serve icon indicating copy to clipboard operation
kaldi-serve copied to clipboard

GPU batch decoding + Online request queueing machanism

Open greed2411 opened this issue 2 years ago • 1 comments

possibly along with a request queueing mechanism like ServiceStreamer for online

greed2411 avatar Oct 22 '21 03:10 greed2411

Task 1

Write an interface and implement GPU batch decoding for Kaldi ASR models in the kaldi-serve core C++ library.

The current partial version (gpu-decoder branch) is buggy (stale issue here), which you may use as a starting point or write one from scratch, it's upto you. The main idea here is to be able to pass a custom async callback to the batch decoding pipeline that accepts the final result once the GPU compute task is complete.

Relevant links:

  1. Batched Decoding binary
  2. Batched Threaded CUDA Pipeline - Source

Task 2

Implement an online request queueing mechanism similar to that of ServiceStreamer that utilizes the GPU Batch Decoding interface (Task 1) to reduce latency in the kaldi-serve gRPC server application during higher loads.

pskrunner14 avatar Nov 17 '21 20:11 pskrunner14