tensor2tensor icon indicating copy to clipboard operation
tensor2tensor copied to clipboard

Training an ASR with transformer with own dataset

Open snpushpi opened this issue 4 years ago • 0 comments

Description

I have been trying to create a model for automatic speech recognition training on this dataset. I tried following this tutorial ASR With Transformer and got it done and I went through the tutorial of working on own data and understood it but when I was also looking for other tensor2tensor models which used different datasets, I also saw the modification they used in defining the problem. So I was wondering about 2 things-

  1. I am trying to train it on an Arabic dataset as I showed above, how do I do those parts of data_dir, temp_dir and fix other directories. Any hint on how to set those directories? I don't seem to get it
  2. When we are declaring the problem class, what modifications we should make for this? For example, I went through this and I was wondering about the translator code and I was thinking about how it should be for an ASR considering the generator and other parts?

...

Environment information

OS: 

$ pip freeze | grep tensor
# your output here

$ python -V
# your output here

For bugs: reproduction and error logs

# Steps to reproduce:
...
# Error logs:
...

snpushpi avatar Jul 21 '20 21:07 snpushpi