torch-audiomentations icon indicating copy to clipboard operation
torch-audiomentations copied to clipboard

cache background_noise rms data

Open fantasyRqg opened this issue 2 years ago • 3 comments

Boost background_noise performance.

  1. Reduce audio decode and file io
  2. Reduce rms compute. maybe a diffrenece between rms(partial audio) and rms(full audio)

fantasyRqg avatar Jun 17 '22 09:06 fantasyRqg

Hi fantasyRgg, and thanks for your PR 😃

Just for context, so I understand the problem you're proposing to solve, I want to ask some questions:

  • How large is your background noise dataset?
  • If you are training a model, how many workers do you use for preparing the audio examples that go into the training batches?
  • How much memory (RAM) is there on the computer where you are doing the training?
  • What audio file format are your background noise files? And do they have the same sample rate as the "clean" input audios that the noises get added to?
  • Are you using an SSD or a HDD?

Ideally, a good solution would work well in all kinds of combinations of answers to those questions

iver56 avatar Jun 20 '22 07:06 iver56

  • How large is your background noise dataset?

    About 2k records

  • If you are training a model, how many workers do you use for preparing the audio examples that go into the training batches?

    Only one worker, I tried multi worker, not fast enough.

  • How much memory (RAM) is there on the computer where you are doing the training?

    I cached samples and noises. samples took 7GB, noiese took 1.5GB

  • What audio file format are your background noise files? And do they have the same sample rate as the "clean" input audios that the noises get added to?

    I don't think audio format and sample rate is problem. audio: Audio paramter will take care of all problem.

  • Are you using an SSD or a HDD?

    HDD

fantasyRqg avatar Jun 22 '22 10:06 fantasyRqg

Thanks for the insight :) Indeed, in your case it makes sense to apply caching like this.

  • [x] HDD
  • [x] Not very large dataset - fits in RAM
  • [x] Single worker

My own use case is quite different, and would actually be best without caching:

  • [x] SSD
  • [x] Very large dataset, cannot fit in RAM
  • [x] Many workers

I don't think audio format and sample rate is problem. audio: Audio paramter will take care of all problem.

The reason why I asked is that resampling (in case of mismatch) may take a significant amount of CPU time, slowing down the model training.

I'm currently wrapping up the 0.11 release, and then I'll have some work preparing a few new transforms, and then after that I'll hopefully have more time to consider this caching feature. In the meantime, thanks for your patience, and I hope you're okay with using your own fork for now

iver56 avatar Jun 29 '22 07:06 iver56