returnn
returnn copied to clipboard
Dataset random_permute_audio rnd_zoom_order should be 0 by default
E.g. for LibriSpeechCorpus, OggZipDataset, TimitDataset.
It's much faster.
Actually, I did some intensive research on this, and tried other alternatives, and this implementation with rnd_zoom_order=0 (i.e. nearest neighbor) turned out to be the fastest, by far (on CPU). Unfortunately I don't really find references on this anymore? Edit I think I have seen this, among other things.
Anyway, rnd_zoom_order=3 is very slow, and I don't expect that it really makes a difference to rnd_zoom_order=0 (but I don't have numbers on this).
Of course, changing this now will have some (probably small) effect on the behavior. Thus a new behavior version for this?
@JackTemaki @michelwi I tagged (assigned) you because I want to get some comment by you on this. Do you agree?
Yes, new behavior version sounds reasonable. I never used this myself though, as far as I know.
The parameter order
refers to
The order of the spline interpolation, default is 3. The order has to be in the range 0-5.
so 0 would be "constant", 3 then "cubic". I would go for linear interpolation (order=1) as a gut feeling, this should already be faster than 3. I am not sure if "constant" is accurate enough for us.
I think order 0 is much faster than 1. And I don't know how much the difference is in final WER performance but I would assume none. Note that this is here on PCM sample level, not feature level.
I think order 0 is much faster than 1.
yes, it can be that order 0 only copies values whereas 1 does some actual calculations. If 0 is precise enough for us, then by all means make it the new default.
I have not done systematic comparison experiments, so I don't know. I just assume this.
ok. My assumption based on general interpolation experience is that there is a large performance gap between constant and linear and then only smaller gaps as the order increases.
But then in this example for data perturbation, we can maybe afford the interpolation error on PCM level or even want it as additional data corruption.
Maybe before we make the change, we really need to do some testing.