Dinghao Zhou comments

Results 114 comments of


                                            Dinghao Zhou

WeNet 3.0 Roadmap

For binding：there is 3 questions： 1 get model by language type， so if we can supply small and big model for each language？（the small model could be trained with kd...

WeNet 3.0 Roadmap

Moree data augment like rir https://github.com/pytorch/audio/issues/2624 torchaudio will add multi channel riri based on pyroomacoustics

[transducer] decoding strategy

- [espnet optimize](https://github.com/espnet/espnet/blob/a672fe65030a7d9424465b2027019c906ae35fe1/espnet2/asr_transducer/beam_search_transducer.py) thanks @[b-flo](https://github.com/b-flo) - [Sequence Transduction with Recurrent Neural Networks](https://arxiv.org/pdf/1211.3711.pdf) - [Alignment-Length Synchronous Decoding for RNN Transducer](https://ieeexplore.ieee.org/document/9053040) - [Accelerating RNN Transducer Inference via One-Step Constrained Beam Search](https://arxiv.org/pdf/2002.03577.pdf) -...

[transducer] decoding strategy

> Hi @Mddct , > > For alignment-length synchronous (ALSD), time-synchronous decodng (TSD) and modified Adaptive Expansion Search (mAES) in ESPnet, please refer to [this](https://github.com/espnet/espnet/blob/a672fe65030a7d9424465b2027019c906ae35fe1/espnet2/asr_transducer/beam_search_transducer.py). The version you linked is...

[transducer] decoding strategy

If we want to use one-step decoding in the inference stage, can we try the implementation of this rnhnt loss later? [paper](https://arxiv.org/abs/1909.12415) [implement](https://github.com/csukuangfj/optimized_transducer)

python performance issues

You need make the decoder to have the function of copying like feature pipeline etc, by the way, the model can be multi-threaded

phonetic posteriorgrams(PPG) extractor

you can get ppg from encoder layer

Can I get timestamp info by GPU inference?

> > @yuekaizhang is it possible? > > Yes, it's possible to add timestamp. Currently gpu inference using this [ctc_decoder](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/third_party/ctc_decoders), which needs to modify to add timestamps. Or we could...

runtime memory leak

I tried some environment variables but none of them worked. Maybe need to recompile libtorch without mkl

runtime memory leak

@lvzhiqiang sorry，I've been busy recently, and haven't tried to solve it yet。