sherpa-onnx icon indicating copy to clipboard operation
sherpa-onnx copied to clipboard

Add C++ runtime for *streaming* faster conformer transducer from NeMo.

Open sangeet2020 opened this issue 1 year ago • 8 comments

This PR is to integrate Nemo's faster conformer transducer into sherpa-decoder. More commits to be added.

sangeet2020 avatar May 17 '24 10:05 sangeet2020

@csukuangfj would we need StackStates and UnStackStates methods for this?

sangeet2020 avatar May 17 '24 12:05 sangeet2020

@csukuangfj would we need StackStates and UnStackStates methods for this?

Yes, please refer to https://github.com/k2-fsa/sherpa-onnx/blob/8af2af84664d3285ba452bf453bb928a3eb6e978/sherpa-onnx/csrc/online-nemo-ctc-model.cc#L121-L122

and

https://github.com/k2-fsa/sherpa-onnx/blob/8af2af84664d3285ba452bf453bb928a3eb6e978/sherpa-onnx/csrc/online-nemo-ctc-model.cc#L156-L157

Note that for decoding, you can support only batch_size == 1.

csukuangfj avatar May 17 '24 12:05 csukuangfj

Hi @csukuangfj , could you please help me with online-transducer-greedy-search-nemo-decoder.cc. A basic outline should be good to start with. Thank you

sangeet2020 avatar May 19 '24 15:05 sangeet2020

  1. Please refer to our Python example for online NeMo transducer greedy search decoding https://github.com/k2-fsa/sherpa-onnx/blob/master/scripts/nemo/fast-conformer-hybrid-transducer-ctc/test-onnx-transducer.py

  2. For simplicity, please support only batch size == 1 for greedy search

  3. Please refer to the offline NeMo transducer greedy search decoding in C++ at https://github.com/k2-fsa/sherpa-onnx/blob/master/sherpa-onnx/csrc/offline-transducer-greedy-search-nemo-decoder.h

All you need is to change the offline C++ version to an online version.

  1. NeMo transducer is stateful so you need to follow https://github.com/k2-fsa/sherpa-onnx/blob/8af2af84664d3285ba452bf453bb928a3eb6e978/sherpa-onnx/csrc/online-stream.h#L91-L92

to add two methods, .e.g.,

 void SetNeMoDecoderStates(std::vector<Ort::Value> states); 
 std::vector<Ort::Value> &GetNeMoDecoderStates(); 
  1. You need to follow https://github.com/k2-fsa/sherpa-onnx/blob/master/sherpa-onnx/csrc/offline-recognizer-transducer-nemo-impl.h to add
online-recognizer-transducer-nemo-impl.h

csukuangfj avatar May 20 '24 01:05 csukuangfj

@csukuangfj could you review these changes please. Waiting for your feedback.

Also, could you assist me with online-transducer-greedy-search-nemo-decoder.cc. Following offline-transducer-greedy-search-nemo-decoder.cc is not so helpful in this case, as its a streaming mode

Thank You

sangeet2020 avatar May 22 '24 10:05 sangeet2020

By the way, you need to change https://github.com/k2-fsa/sherpa-onnx/blob/81346d11728e675ddea2645738b394a8b82078d3/sherpa-onnx/csrc/online-recognizer-impl.cc#L15-L17

and

https://github.com/k2-fsa/sherpa-onnx/blob/81346d11728e675ddea2645738b394a8b82078d3/sherpa-onnx/csrc/online-recognizer-impl.cc#L36-L38

You can use the number of outputs from the decoder model to decide whether to create a normal OnlineRecognizerTransducerImpl or OnlineRecognizerTransducerNeMoImpl.

You can refer to https://github.com/k2-fsa/sherpa-onnx/blob/81346d11728e675ddea2645738b394a8b82078d3/sherpa-onnx/csrc/online-transducer-model.cc#L45 to create a session for the decoder model and refer to the following code to get the number of outputs for the decoder model https://github.com/k2-fsa/sherpa-onnx/blob/81346d11728e675ddea2645738b394a8b82078d3/sherpa-onnx/csrc/onnx-utils.cc#L38

You only need to support two kinds of transducer models in sherpa-onnx: one for stateless transducer, and one for NeMo stateful transducer.

csukuangfj avatar May 22 '24 13:05 csukuangfj

Following offline-transducer-greedy-search-nemo-decoder.cc is not so helpful in this case, as its a streaming mode

We have both a C++ and a Python version for the non-streaming nemo transducer greedy search and a Python version for streaming NeMo transducer greed search.

Please read them carefully. The only differences from the non-streaming one:

  • You need to process chunk-by-chunk, where there are already code examples for stateless streaming transducer and for stateful NeMo CTC model
  • You need to save the decoder states across chunks

csukuangfj avatar May 22 '24 13:05 csukuangfj

Hi @csukuangfj , Thank you for the feedback. i have made necessary changes as you said above. Can you please review it once.

Thank You

sangeet2020 avatar May 23 '24 10:05 sangeet2020

By the way, please make sure the code compiles successfully on your computer.

csukuangfj avatar May 24 '24 04:05 csukuangfj

Hi @csukuangfj,

I am unable to pin-point and solve this compilation error. Could you please take a look.

[ 56%] Building CXX object sherpa-onnx/csrc/CMakeFiles/sherpa-onnx-core.dir/online-recognizer-impl.cc.o
In file included from /usr/include/c++/11/memory:76,
                 from /mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-recognizer-impl.h:8,
                 from /mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-recognizer-impl.cc:5:
/usr/include/c++/11/bits/unique_ptr.h: In instantiation of ‘typename std::_MakeUniq<_Tp>::__single_object std::make_unique(_Args&& ...) [with _Tp = sherpa_onnx::OnlineTransducerModifiedBeamSearchDecoder; _Args = {sherpa_onnx::OnlineTransducerModel*, sherpa_onnx::OnlineLM*, int&, float&, int&, float&, float&}; typename std::_MakeUniq<_Tp>::__single_object = std::unique_ptr<sherpa_onnx::OnlineTransducerModifiedBeamSearchDecoder, std::default_delete<sherpa_onnx::OnlineTransducerModifiedBeamSearchDecoder> >]’:
/mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-recognizer-transducer-impl.h:109:77:   required from here
/usr/include/c++/11/bits/unique_ptr.h:962:30: error: invalid new-expression of abstract class type ‘sherpa_onnx::OnlineTransducerModifiedBeamSearchDecoder’
  962 |     { return unique_ptr<_Tp>(new _Tp(std::forward<_Args>(__args)...)); }
      |                              ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-recognizer-transducer-impl.h:30,
                 from /mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-recognizer-impl.cc:9:
/mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-transducer-modified-beam-search-decoder.h:18:7: note:   because the following virtual functions are pure within ‘sherpa_onnx::OnlineTransducerModifiedBeamSearchDecoder’:
   18 | class OnlineTransducerModifiedBeamSearchDecoder
      |       ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-stream.h:17,
                 from /mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-recognizer.h:22,
                 from /mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-recognizer-impl.h:13,
                 from /mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-recognizer-impl.cc:5:
/mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-transducer-decoder.h:85:35: note:     ‘virtual std::vector<Ort::Value> sherpa_onnx::OnlineTransducerDecoder::Decode_me(Ort::Value, std::vector<Ort::Value>, std::vector<sherpa_onnx::OnlineTransducerDecoderResult>*, sherpa_onnx::OnlineStream**, int32_t)’
   85 |   virtual std::vector<Ort::Value> Decode_me(Ort::Value encoder_out,
      |                                   ^~~~~~~~~
In file included from /usr/include/c++/11/memory:76,
                 from /mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-recognizer-impl.h:8,
                 from /mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-recognizer-impl.cc:5:
/usr/include/c++/11/bits/unique_ptr.h: In instantiation of ‘typename std::_MakeUniq<_Tp>::__single_object std::make_unique(_Args&& ...) [with _Tp = sherpa_onnx::OnlineTransducerGreedySearchDecoder; _Args = {sherpa_onnx::OnlineTransducerModel*, int&, float&, float&}; typename std::_MakeUniq<_Tp>::__single_object = std::unique_ptr<sherpa_onnx::OnlineTransducerGreedySearchDecoder, std::default_delete<sherpa_onnx::OnlineTransducerGreedySearchDecoder> >]’:
/mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-recognizer-transducer-impl.h:115:71:   required from here
/usr/include/c++/11/bits/unique_ptr.h:962:30: error: invalid new-expression of abstract class type ‘sherpa_onnx::OnlineTransducerGreedySearchDecoder’
  962 |     { return unique_ptr<_Tp>(new _Tp(std::forward<_Args>(__args)...)); }
      |                              ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-recognizer-transducer-impl.h:28,
                 from /mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-recognizer-impl.cc:9:
/mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-transducer-greedy-search-decoder.h:15:7: note:   because the following virtual functions are pure within ‘sherpa_onnx::OnlineTransducerGreedySearchDecoder’:
   15 | class OnlineTransducerGreedySearchDecoder : public OnlineTransducerDecoder {
      |       ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-stream.h:17,
                 from /mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-recognizer.h:22,
                 from /mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-recognizer-impl.h:13,
                 from /mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-recognizer-impl.cc:5:
/mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-transducer-decoder.h:85:35: note:     ‘virtual std::vector<Ort::Value> sherpa_onnx::OnlineTransducerDecoder::Decode_me(Ort::Value, std::vector<Ort::Value>, std::vector<sherpa_onnx::OnlineTransducerDecoderResult>*, sherpa_onnx::OnlineStream**, int32_t)’
   85 |   virtual std::vector<Ort::Value> Decode_me(Ort::Value encoder_out,
      |                                   ^~~~~~~~~
In file included from /usr/include/c++/11/memory:76,
                 from /mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-recognizer-impl.h:8,
                 from /mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-recognizer-impl.cc:5:
/usr/include/c++/11/bits/unique_ptr.h: In instantiation of ‘typename std::_MakeUniq<_Tp>::__single_object std::make_unique(_Args&& ...) [with _Tp = sherpa_onnx::OnlineTransducerGreedySearchNeMoDecoder; _Args = {sherpa_onnx::OnlineTransducerNeMoModel*, float&}; typename std::_MakeUniq<_Tp>::__single_object = std::unique_ptr<sherpa_onnx::OnlineTransducerGreedySearchNeMoDecoder, std::default_delete<sherpa_onnx::OnlineTransducerGreedySearchNeMoDecoder> >]’:
/mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-recognizer-transducer-nemo-impl.h:53:75:   required from here
/usr/include/c++/11/bits/unique_ptr.h:962:30: error: invalid new-expression of abstract class type ‘sherpa_onnx::OnlineTransducerGreedySearchNeMoDecoder’
  962 |     { return unique_ptr<_Tp>(new _Tp(std::forward<_Args>(__args)...)); }
      |                              ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-recognizer-transducer-nemo-impl.h:26,
                 from /mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-recognizer-impl.cc:10:
/mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-transducer-greedy-search-nemo-decoder.h:15:7: note:   because the following virtual functions are pure within ‘sherpa_onnx::OnlineTransducerGreedySearchNeMoDecoder’:
   15 | class OnlineTransducerGreedySearchNeMoDecoder : public OnlineTransducerDecoder {
      |       ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-stream.h:17,
                 from /mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-recognizer.h:22,
                 from /mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-recognizer-impl.h:13,
                 from /mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-recognizer-impl.cc:5:
/mnt/local/sangeet/workncode/k2-fsa/fang/sherpa-onnx/sherpa-onnx/csrc/online-transducer-decoder.h:82:16: note:     ‘virtual void sherpa_onnx::OnlineTransducerDecoder::Decode(Ort::Value, std::vector<sherpa_onnx::OnlineTransducerDecoderResult>*)’
   82 |   virtual void Decode(Ort::Value encoder_out,
      |                ^~~~~~
cc1plus: note: unrecognized command-line option ‘-Wno-missing-template-keyword’ may have been intended to silence earlier diagnostics
make[2]: *** [sherpa-onnx/csrc/CMakeFiles/sherpa-onnx-core.dir/build.make:832: sherpa-onnx/csrc/CMakeFiles/sherpa-onnx-core.dir/online-recognizer-impl.cc.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:1552: sherpa-onnx/csrc/CMakeFiles/sherpa-onnx-core.dir/all] Error 2
make: *** [Makefile:156: all] Error 2

sangeet2020 avatar May 27 '24 09:05 sangeet2020

I suggest that you copy & paste our C++ greedy search decoding code for non-streaming stateful NeMo transducer and then change the code to handle the states of the decoder model.

Almost everything you need is already there.

csukuangfj avatar May 27 '24 10:05 csukuangfj

Hi @csukuangfj , I really appreciate all your help throughout . Can I please request you fix the greedy decoder implementation ..been stuck for quite some now, and cant get any way through this. thank you

sangeet2020 avatar May 28 '24 15:05 sangeet2020

Sure, will push new commits to your branch this week.

csukuangfj avatar May 29 '24 13:05 csukuangfj

Hi @csukuangfj , I made some minor changes. As of now, there are no errors, decoding works. but the predictions are correct only upto few decoding streams, then it starts incorrect predictions.

To give you an example.. CORRECT PREDICTION: after early nightfall the yellow lamps... CURRENT PREDICTION: after the would light here and...

I have the suspicion that something is wrong inside the greedy search decoder implementation.

sangeet2020 avatar May 29 '24 13:05 sangeet2020

You are almost there!

I am merging it and take care of the rest.

Thank you for your contribution!

csukuangfj avatar May 30 '24 05:05 csukuangfj