audiomentations Optimized Background Noise Augmentation for Large Background Files

Proposed Algorithm:

Get duration of the background file (bg_file) in seconds.
Sample a random value from the range [0, bg_file_seconds - event_file_seconds).
Read the background file from sampled_value to sampled_value + event_file_seconds.

This approach ensures that:

We only load a portion of the background file required for the augmentation.
It maintains randomness in background selection while reducing memory overhead.
It is adaptable to cases with varied sample rates and event/background file durations.

Experiments and Results:

I’ve tested this algorithm using:

Event durations ranging from 1 to 9 seconds.
Background durations ranging from 81 to 10,000 seconds.
Sample rates: 16,000 Hz, 22,500 Hz, and 44,100 Hz.

This optimized approach significantly reduces memory usage while maintaining augmentation quality. I’ve attached the comparison plot showcasing the performance difference for your reference.

Improves scalability by avoiding unnecessary memory consumption for large files.
Enhances performance in real-time audio augmentation workflows.
Can be integrated as a feature or an option in AddBackgroundNoise to provide more flexibility to users.

Please let me know your thoughts on this proposal and if any further details or clarifications are needed.

In the below figure First plot shows difference between memory usage over the test_cases normalized for by 1e6 and next graph is time taken comparison of old vs proposed.

Labeled_final

Oct 12 '24 18:10 PratikKulkar

Thanks for the PR. I will have a closer look when I have time

Oct 14 '24 07:10 iver56

In this case I would prefer lazy caching over eager caching. The difference becomes quite noticeable when there is a large number of files. Hypothetically, if you have half a million files, and it takes 1 ms to check the duration of each file, initializing the class would take 500 seconds. On the other hand, with lazy caching, initializing the class would be almost instant.

Jan 15 '25 13:01 iver56

Hello @iver56,

Thank you for your valuable feedback. I have implemented the suggested changes and replaced eager caching with lazy caching. The system now caches file-related time information on demand, significantly improving the initialization speed for large datasets.

I do have a question regarding the lookup mechanism for file time information. Currently, I am using a dictionary for this purpose, but its average-case time complexity for lookups is not guaranteed to be constant. I am exploring an alternative approach using an array of size len(sound_file_paths). With this method:

Each file would be assigned an index (e.g., from 0 to len(sound_file_paths) - 1).
File paths and corresponding time information could then be accessed directly using the index, enabling efficient retrieval.

Additionally, I was wondering if there’s any provision in the current system to prioritize sampling certain files more frequently than others—for instance, based on importance, weight, or any custom-defined priority. If such functionality does not currently exist, is there a plan to introduce it in the future?

Thank you for your time and guidance. I appreciate your input and look forward to your feedback!

Best regards, Pratik Kulkar

Jan 15 '25 18:01 PratikKulkar

Thanks for implementing that change

I do have a question regarding the lookup mechanism for file time information. Currently, I am using a dictionary for this purpose, but its average-case time complexity for lookups is not guaranteed to be constant. I am exploring an alternative approach using an array of size len(sound_file_paths). With this method:

Each file would be assigned an index (e.g., from 0 to len(sound_file_paths) - 1).

File paths and corresponding time information could then be accessed directly using the index, enabling efficient retrieval.

dict lookups are O(1) on average for both string and integer keys
Having integers as keys is faster than having strings as keys, due to faster hashing and comparison. And it uses less memory.
Accessing a value in an array/list is also O(1), but in practice it is faster than a dict lookup
a numpy array requires less memory than a python-native list of floats

Here's a rough comparison of the memory usage in the three different alternatives, given that there are half a million items: List of floats: ~16 MB Dictionary (int keys, float values): ~47 MB NumPy array (float32): ~2 MB

If you feel like optimizing it with your array idea, here's my green light: 🟢

Jan 16 '25 13:01 iver56

Additionally, I was wondering if there’s any provision in the current system to prioritize sampling certain files more frequently than others—for instance, based on importance, weight, or any custom-defined priority. If such functionality does not currently exist, is there a plan to introduce it in the future?

I don't have any immediate plans for adding that feature, but you're welcome to add an issue for it

Jan 16 '25 13:01 iver56

Hello @iver56 ,

Thank you for your detailed response and insights into the performance and memory usage of different data structures. The comparison between list, dictionary, and NumPy array was particularly helpful.

Based on your feedback:

I have implemented the idea of saving time information as discussed, with a focus on efficient storage and retrieval.
I have also fixed a logical bug in the previous commit titled "Avoiding Preloading" to ensure the updated implementation aligns with the lazy caching approach.

Regarding the feature to prioritize file sampling based on weights or importance, I will create an issue for it in the repository to track any discussions or future plans around it.

Thank you once again for your guidance and support. I look forward to your feedback on the latest changes.

Best regards, Pratik Kulkar

Jan 16 '25 19:01 PratikKulkar

Additionally, I was wondering if there’s any provision in the current system to prioritize sampling certain files more frequently than others—for instance, based on importance, weight, or any custom-defined priority. If such functionality does not currently exist, is there a plan to introduce it in the future?

I don't have any immediate plans for adding that feature, but you're welcome to add an issue for it

@PratikKulkar see my PR here for this feature!

Apr 20 '25 04:04 worldveil

@PratikKulkar Could you please resolve the merge conflict? Thanks in advance! It's time to get this merged

Jul 31 '25 20:07 iver56

@PratikKulkar Could you please resolve the merge conflict? Thanks in advance! It's time to get this merged

Hello @iver56,

The conflict has been resolved. Kindly review and let me know if everything looks good.

Best regards, Pratik Kulkar

Aug 03 '25 13:08 PratikKulkar

Thanks! I will have a look

Aug 03 '25 19:08 iver56

Hey @PratikKulkar thanks for the work you've done I missed your PR and ended up implementing the same thing 😅. Maybe we can use this PR to remove the lru_cache from this augmentation as it won't be working properly in this set up anyway. Not sure how @iver56 sees this as it will be a breaking change. Also we can apply the same idea to ApplyImpulseResponse but I guess it's better to open another PR for the sake of clarity.

Aug 04 '25 19:08 JorisCos

Hey @PratikKulkar thanks for the work you've done I missed your PR and ended up implementing the same thing 😅. Maybe we can use this PR to remove the lru_cache from this augmentation as it won't be working properly in this set up anyway. Not sure how @iver56 sees this as it will be a breaking change. Also we can apply the same idea to ApplyImpulseResponse but I guess it's better to open another PR for the sake of clarity.

Hey @JorisCos, no worries at all — these things happen 😅 Glad we were thinking along the same lines! I agree, we can definitely repurpose this PR to remove the lru_cache from the augmentation — especially since it doesn’t behave well with this partial loading setup and could lead to unexpected memory issues.

As for ApplyImpulseResponse, yeah, makes sense to handle that in a separate PR for clarity.

Let me know what @iver56 thinks about the breaking change here, and I can update the PR accordingly.

Aug 05 '25 18:08 PratikKulkar

Here's my recommendation: Set the default value of the lru_cache_size to None. If a value other than None gets passed, raise a TypeError exception explaining that that the LRU cache functionality has been removed, and it is no longer valid to set the lru_cache_size param.

You can remove the functools.lru_cache stuff accordingly. As audiomentations is still in alpha, it's technically okay to do breaking changes, but it's still nice to have them be well explained (clear how to make it compatible with new version) and have a good reason for doing the breaking change.

Thanks for the help, @PratikKulkar

Aug 05 '25 19:08 iver56