transformers icon indicating copy to clipboard operation
transformers copied to clipboard

[WIP] Add implementation of `spectrogram_batch`

Open ravenouse opened this issue 9 months ago • 15 comments

What does this PR do?

This pull request introduces the implementation of spectrogram_batch, specifically optimized for batch processing through broadcasting techniques. The primary goal is to reduce the data processing time during feature extraction, which is a critical step in working with audio models like whisper.

Motivation

In my work and research with the whisper model, I observed that the feature extraction step can be exceedingly time-consuming, taking up to 10 hours for certain audio datasets.
In my opinion, the bottleneck is primarily due to the lack of batch processing support in the current spectrogram and FeatureExtractor implementations, resulting in iterative calls within a for-loop, as illustrated below:

# Reference: https://github.com/huggingface/transformers/blob/main/src/transformers/models/whisper/feature_extraction_whisper.py#L250
input_features = [self._np_extract_fbank_features(waveform) for waveform in input_features[0]]

Future Work

The current branch only adds a basic implementation of the spectrogram_batch. To bring this implementation to production level, I believe there are several steps needed to be done:

  1. Extensive Testing: Implementing a comprehensive suite of tests to evaluate the function’s performance and correctness across different parameter settings. I only tested the implementation in the whisper models' setting.
  2. Integration: Modifying existing feature extractor codes to incorporate the new batch processing function.

I am fully committed to continuing the development and testing of this feature. However, given the extensive changes and the potential impact on the library, I am reaching out for collaboration and support from the community/maintainers. Any guidance, suggestions, or contributions to this effort would be immensely appreciated.

Before submitting

  • [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • [x] Did you read the contributor guideline, Pull Request section?
  • [ ] Was this discussed/approved via a Github issue or the forum? Please add a link to it if that's the case.
  • [ ] Did you make sure to update the documentation with your changes? Here are the documentation guidelines, and here are tips on formatting docstrings.
  • [ ] Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR.

@sanchit-gandhi @ArthurZucker

ravenouse avatar Oct 30 '23 20:10 ravenouse

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

Super cool PR @ravenouse, a most welcome upgrade! As mentioned by @ArthurZucker, one other optimisation we can employ is using the torch.stft backend for computing the spectrograms: https://github.com/huggingface/transformers/pull/26119#issue-1892916324. This yields approx 4x speed-up for processing with bs=1. I believe @ylacombe is considering adding support for this, which could be a nice parallel to your PR!

sanchit-gandhi avatar Nov 06 '23 13:11 sanchit-gandhi

Thanks for working on this @ravenouse, this looks super promising and will clearly be valuable for any audio-related model!

I'm indeed considering adding a torch alternative of this numpy implementation! What would be really good for these current/future improvements is that we conduct extensive speed benchmark to allow users to make an informed choice when choosing implementation!

ylacombe avatar Nov 06 '23 14:11 ylacombe

Thank you so much for all the feedbacks and information! @ArthurZucker @sanchit-gandhi @ylacombe

I am excited to know the 4x speed up brought by the torch.stft. Inspired by the experiment from @sanchit-gandhi , I conducted a similar experiment for spectrogram_batch, resulting in a 2x speedup when bs=200, compared to the original function with bs=1. Link to the experimenting notebook: https://colab.research.google.com/drive/1aXytDfXiMy_tzvjP9A4rM7Z24jmV2Ha-?usp=sharing

For further enhancement, I believe implementing code that enables GPU acceleration for feature extraction and providing users with the option to select GPUs would be an incredible step forward. Prior to submitting this PR, I experimented with a CuPy version of spectrogram_batch. My initial findings indicate that the CuPy Batch version is even faster than the Numpy Batch version, if the GPU memory is managed effectively. I anticipate that the torch GPU version can achieve comparable performance. An experimental notebook exploring this aspect will be shared shortly.

Once again, I am more than happy to contribute to the package in this direction. Please let me know if there is anything else I can do to further the effort.

ravenouse avatar Nov 07 '23 01:11 ravenouse

cc @ylacombe and @sanchit-gandhi let's try to maybe merge this now and have the rest as a todo?

ArthurZucker avatar Dec 01 '23 09:12 ArthurZucker

Hi @ylacombe , Thank you so much for taking the time to review this PR. I really appreciate the insightful inputs you have shared.

I will definitely follow the suggested directions, ensuring that the spectrogram_batch generates the same results as the original function.

Currently, I am working towards a deadline this week. Once that is completed, I will prioritize making the necessary modifications to the PR.

Once again, thank you for your valuable input and support!

ravenouse avatar Dec 10 '23 22:12 ravenouse

Hi @ylacombe,

I hope you had a wonderful break!

Following your feedback, I've updated the spectrogram_batch function to enhance its pre and post-processing capabilities. The key modifications include:

  • Utilizing original_waveform_lengths and true_num_frames to capture and truncate the results to their true lengths.
  • Currently, the function relies on the pad function of the SequenceFeatureExtractor, as detailed here. This results in some redundancy in the code.

You can review the revised function in this Colab notebook. It produces the same results as the original spectrogram when tested on hf-internal-testing/librispeech_asr_dummy.

For the next step, I plan to:

  1. Implement a simple and efficient batch padding method, eliminating the current reliance on SequenceFeatureExtractor..
  2. Implement batch version for the amplitude_to_db and power_to_db.
  3. Add the function annotation and docstrings to the functions.

Please let me know what your thoughts on this. Thank you very much!

ravenouse avatar Dec 28 '23 03:12 ravenouse

Hey @ravenouse, thanks for all the progress so far and happy new year! Could you actually update the code here so that it's easier to review, test and keep track of the comments? Many thanks!

ylacombe avatar Jan 01 '24 14:01 ylacombe

Hi @ylacombe, happy new year!

I have updated the code: I modified the spectrogram_batch further, eliminating the previous dependency of the SequenceFeatureExtractor for batch padding the waveforms.

In case you want to run the test on your own, here is the updated notebook: Link

Please let me know what you think about this!

Thank you so much for your time!

ravenouse avatar Jan 02 '24 09:01 ravenouse

Gently pining @ylacombe

sanchit-gandhi avatar Jan 26 '24 17:01 sanchit-gandhi

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions[bot] avatar Feb 20 '24 08:02 github-actions[bot]

Hi @ylacombe,

Thank you so much for your input!

I will address the problems and directions you've highlighted, focusing specifically on:

  1. Testings,
  2. Implementations of the batch versions for amplitude_to_db and power_to_db.

Thank you again!

ravenouse avatar Mar 11 '24 15:03 ravenouse

Hi @ylacombe,

Good morning!

Could you please reopen this PR?

I will work actively on this PR: I am aiming to finish the above-mentioned problems/to-dos this week.

Thank you so much!

ravenouse avatar Mar 18 '24 16:03 ravenouse

Hey @ravenouse, of course!

ylacombe avatar Mar 18 '24 16:03 ylacombe

Hi @ylacombe,

Good day!

I wanted to provide a quick update: the testing phase is nearly complete. I am in the process of organizing the function names and preparing the notebook to demonstrate how the expected values used in the tests are generated.

This PR will be ready for your review by next Monday.

Thank you so much for all your support!

ravenouse avatar Mar 22 '24 11:03 ravenouse

Hi @ylacombe,

I believe this PR is now ready for your review! To facilitate the review process, I've provided a link to a Colab notebook below. This notebook demonstrates the methodology used to generate the expected values used in the test functions. Link: https://colab.research.google.com/drive/1bzYz8MydhuHUAIZauzHjrjMQ_Of89ZAj?usp=sharing

Please let me know your thoughts on this and if there is anything I should work on further!

Thank you so much!

ravenouse avatar Mar 26 '24 12:03 ravenouse

Hey @ravenouse, thanks for updating the code and for the Colab, I'll take a look ASAP!

ylacombe avatar Apr 01 '24 07:04 ylacombe

Gentle ping @ylacombe

amyeroberts avatar Apr 25 '24 09:04 amyeroberts