lingvo icon indicating copy to clipboard operation
lingvo copied to clipboard

Code for paper on choice of modeling unit for sequence-to-sequence speech recognition

Open argideritzalpea opened this issue 4 years ago • 3 comments

Hi @rprabhavalkar @tonybruguier-google I was wondering if the code for this paper ("On the Choice of Modeling Unit for Sequence-to-Sequence Speech Recognition") was open source: https://www-i6.informatik.rwth-aachen.de/publications/download/1106/IrieKazukiPrabhavalkarRohitKannanAnjuliBruguierAntoineRybachDavidNguyenPatrick--OntheChoiceofModelingUnitforSequence-to-SequenceSpeechRecognition--2019.pdf

I would be very interested in assisting the project by adding data importers for Mozilla's DeepSpeech corpora and conducting experiments for different languages following the aforementioned paper's methodologies. Specifically, my hope is to reproduce the methods in Section 6.

argideritzalpea avatar Apr 04 '20 18:04 argideritzalpea

@kazuki-irie @rprabhavalkar any comment?

drpngx avatar Apr 04 '20 19:04 drpngx

I am not up to date with the current situation with Lingvo at all, so I would wait for @rprabhavalkar's comment, but:

  • I believe the word-piece and grapheme level model setups have been already available: https://github.com/tensorflow/lingvo/blob/master/lingvo/tasks/asr/params/librispeech.py Librispeech960Wpm and Librispeech960Grapheme are similar to the setups used in the paper.

  • For phoneme level models, I think it would be too much effort to make everything publicly available at this point. There might still be some chance for the code/setup for training, but for decoding and rescoring (for Section 6, which is mainly requested by @argideritzalpea), I believe I had a number of specific custom ops and scripts for all these experiments.

I'm sorry @argideritzalpea for not being able to help much; this was something I did during my internship (back to 2018), and Lingvo itself was not open source at that time yet.

kazuki-irie avatar Apr 04 '20 21:04 kazuki-irie

@argideritzalpea, thanks for your interest in the work. As @kazuki-irie mentioned, this work was done before lingvo was opensourced and was implemented by writing a few custom ops the main one being one that converts word sequences to phoneme tokens. As @kazuki-irie mentioned, this work was done a while back and I'm sorry but we don't have any plans to to opensource it.

rprabhavalkar avatar Apr 07 '20 17:04 rprabhavalkar