openspeech Support for partial data usage for LibriSpeech

Support for partial data usage for LibriSpeech

Open kushal-g opened this issue 4 years ago • 2 comments

There should be a functionality where instead of having to download entire dataset and train on it, we could download just partial data and use only that for training. And if not, then the documentation should clearly mention how the dataset directory structure should look like so that it's easier for us to use our own partial dataset. I'm currently trying to train a RNN-T model and I keep facing issues with directory structure.

Command that I'm using python ./openspeech_cli/hydra_train.py dataset=librispeech dataset.dataset_download=False dataset.dataset_path=/home/guest/flsp/SpeechToText/RNN-T/openspeech/LIBRISPEECH_AUTO_DOWNLOAD/LibriSpeech dataset.manifest_file_path=/home/guest/flsp/SpeechToText/RNN-T/openspeech/LIBRISPEECH_AUTO_MANIFEST tokenizer=libri_subword model=rnn_transducer audio=melspectrogram lr_scheduler=warmup_reduce_lr_on_plateau trainer=gpu

Sep 24 '21 05:09 kushal-g

There were many questions about the directory structure, so I thought I should document it.
Please wait for a moment.

Sep 25 '21 07:09 sooftware

What is the status of this?

Sep 29 '21 06:09 kushal-g

openspeech openspeech copied to clipboard

Support for partial data usage for LibriSpeech

openspeech
openspeech copied to clipboard