DeepSpeech
DeepSpeech copied to clipboard
Provide command line parameters for sample skipping and ordering
Command line parameters for sample skipping allows for better bisecting of faulty samples in new corpora. Changing the ordering helps in determining maximum batch size.
Would it be an idea to have just the more general option to not sort csv's and use them as ordered, then you can manually use any order you like with just one extra option. I just needed to use this to kind of bisect an issue around (probably) cudnn versions blowing up on certain samples.
Can make up a patch for this and send a pull request, but the naming for such an option doesn't seem to be very straight forward. Something like: --[no]sample_sort, reorder train, dev and test samples by wav_filesize (default: 'true') ?