persephone icon indicating copy to clipboard operation
persephone copied to clipboard

If WAVs are empty, present a warning on feature extraction, skipping the file instead of crashing

Open oadams opened this issue 7 years ago • 4 comments

oadams avatar Feb 07 '18 04:02 oadams

This is probably a good spot for using the warnings module and logging any occurrences where this happens.

shuttle1987 avatar Feb 07 '18 04:02 shuttle1987

Status on this one? How is emptiness defined in this context?

shuttle1987 avatar Sep 15 '18 07:09 shuttle1987

Nothing has changed. Empty means there is no actual WAV data. It's just a header with duration 0 WAV.

Another related thing would be to skip utterances where the number of frames after feature extraction is less than the number of labels in the corresponding transcription, as that will break the CTC algorithm.

oadams avatar Sep 17 '18 13:09 oadams

It's just a header with duration 0 WAV.

That should be fairly easy to check for. The utterances check is far more subtle, good catch.

shuttle1987 avatar Sep 17 '18 13:09 shuttle1987