lingvo
lingvo copied to clipboard
Lingvo Tutorial How To Train
Is there a tutorial on how to train a new model with new custom dataset? I would like to train new ASR model for different languages but Lingvo does not seem to have a straightforward API for training a new model. Are you planning on writing a tutorial or can you point me to an existing one? The codelab example is not helpful in this regard.
Researchers at Google claims to provide this ability within the framework.
Modular building blocks. Lingvo is designed for collaboration, focusing on code with a consistent interface and style that is easy to read and understand, and a flexible modular layering system that promotes code reuse. The same building blocks, such as LSTM or attention layers, can be used as-is across different models with assurance of good quality and performance. Because the blocks are general, an algorithmic improvement in one task (such as the use of multi-head attention in Machine Translation) can be immediately applied to another task (e.g. Speech Recognition). With many people using the same codebase, this makes it extremely easy to employ ideas others are trying in your own models. This also makes it simple to adapt existing models to new datasets. The building blocks are each individual classes, making it straightforward to extend and override their implementation. Layers are composed in a hierarchical manner, separating low-level implementation details from high-level control flow.
and philosophy of Lingvo
Shared, comparable, reproducible, understandable, and correct experiments. A big problem in research is the difficulty in reproducing and comparing results, even between people working in the same team. To better document experiments and allow the same experiment to be re-run in the future, ...
What to do then?
For example, To train an ASR system:
/lingvo/tasks/asr/
provides recipes or in Lingvo
terms
The building blocks are each individual classes, making it straightforward to extend and override their implementation
Therefore, as of current you need to write your own code
with lingvo for training an re-use other classes and modules in your own code
imo. And if it is not the case and I am completely wrong then please correct me.
Thanks @omerasif57 !
There is no documentation per se on how to generate a new dataset. You need to write your own set of dataset extraction scripts. Let us know if you have specific questions.