returnn Dataset random_permute_audio rnd_zoom

Dataset random_permute_audio rnd_zoom_order should be 0 by default

Open albertz opened this issue 2 years ago • 1 comments

E.g. for LibriSpeechCorpus, OggZipDataset, TimitDataset.

It's much faster.

Actually, I did some intensive research on this, and tried other alternatives, and this implementation with rnd_zoom_order=0 (i.e. nearest neighbor) turned out to be the fastest, by far (on CPU). Unfortunately I don't really find references on this anymore? Edit I think I have seen this, among other things.

Anyway, rnd_zoom_order=3 is very slow, and I don't expect that it really makes a difference to rnd_zoom_order=0 (but I don't have numbers on this).

Of course, changing this now will have some (probably small) effect on the behavior. Thus a new behavior version for this?

Sep 10 '22 20:09 albertz

@JackTemaki @michelwi I tagged (assigned) you because I want to get some comment by you on this. Do you agree?

Sep 23 '22 09:09 albertz

Yes, new behavior version sounds reasonable. I never used this myself though, as far as I know.

Sep 26 '22 11:09 JackTemaki

The parameter order refers to

The order of the spline interpolation, default is 3. The order has to be in the range 0-5.

so 0 would be "constant", 3 then "cubic". I would go for linear interpolation (order=1) as a gut feeling, this should already be faster than 3. I am not sure if "constant" is accurate enough for us.

Sep 26 '22 13:09 michelwi

I think order 0 is much faster than 1. And I don't know how much the difference is in final WER performance but I would assume none. Note that this is here on PCM sample level, not feature level.

Sep 26 '22 14:09 albertz

I think order 0 is much faster than 1.

yes, it can be that order 0 only copies values whereas 1 does some actual calculations. If 0 is precise enough for us, then by all means make it the new default.

Sep 26 '22 15:09 michelwi

I have not done systematic comparison experiments, so I don't know. I just assume this.

Sep 26 '22 15:09 albertz

ok. My assumption based on general interpolation experience is that there is a large performance gap between constant and linear and then only smaller gaps as the order increases.

But then in this example for data perturbation, we can maybe afford the interpolation error on PCM level or even want it as additional data corruption.

Sep 26 '22 15:09 michelwi

Maybe before we make the change, we really need to do some testing.

Oct 10 '22 22:10 albertz

returnn returnn copied to clipboard

Dataset random_permute_audio rnd_zoom_order should be 0 by default

returnn
returnn copied to clipboard