transformers
transformers copied to clipboard
Add Pop2Piano
What does this PR do?
Adds Pop2Piano model to HuggingFace.
Fixes #20126
Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
- [x] Did you read the contributor guideline, Pull Request section?
- [x] Was this discussed/approved via a Github issue or the forum? Please add a link to it if that's the case.
- [ ] Did you make sure to update the documentation with your changes? Here are the documentation guidelines, and here are tips on formatting docstrings.
- [ ] Did you write any new necessary tests?
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR.
@ArthurZucker
Hi @ArthurZucker the implementation is almost ready(also tests) but I feel that the way I implemented this model is not a descent level, thats why I want you to take a look at the Pop2PianoModel
structure.
Just to be clear with the feature extractor, the Pop2PianoFeatureExtractor
takes raw_audio as input and generates variable length output(10, 50000
, 15, 62200
), even if I pad the raw_audio at start, it will still produce different results for different audio files, so I used lists to stack them and then wrapped them through BatchFeature
.
Please don't mind about docs I will change them afterwards
EDIT : Please ignore this
(Here is the author of pop2piano) Thank you for doing this PR. It seems that this was implemented by understanding the original code better than me! Please feel free to ask me if there is anything I can check or do.
@sweetcocoa Thanks for you comments, HF team has helped me a lot in this integration.
For solving the import issues, you have to create a require_xxx
with the name of the package. Look for example at the require_accelerate
in the testing_utils.py
! 😉
Hi @ArthurZucker thanks for you comment!
But I have already created require_xxx
in testing_utils.py
regarding essentia
and pretty_midi
and also I have used them in transformers/src/transformers/models/pop2piano/__init__.py
.
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.
Hi @ArthurZucker sorry for the delay but the tests are green now! please review it.
Hi @ArthurZucker , sorry for the huge delay, I have made most of the changes that you asked. Also there are some changes that I didn't do these are below -
- I managed to remove dependency on soundfile and torchaudio but not librosa, since raw_audio is used 2 times first time in
extract_rhythm
which takes audio with original sampling_rate and the second time insingle_preprocess
which first upscales/downscales raw_audio to sampling_rate of 22050 and then uses it. And preloading raw_audio with sampling_rate of 22050(not with native sampling_rate) was giving very bad results! I tried to use scipy.resample but since it uses fft it is relatively slow and less accurate. - As you suggested to pad the feature_extractor outputs with silence, I tried to do that but I found that different audio files with same length have input_features of different shapes! For example one was having shape of [7, 38, 512] and another one of [6, 42, 512], both were 10s audios. I could pad them and use them in a batch but then I need to keep track of their shapes which would introduce another variable, what do you suggest? Should I leave them or try to pad and keep track of shapes?
The licensing page of essentia
says - "Essentia is available under an open license, Affero GPLv3, for non-commercial applications, thus it is possible to test the library before deciding to licence it under a comercial licence." I don't know much about licensing so I will leave it upto you to decide what to change in the headings.
Also please forgive me if I missed something, I will change them in the future commit.
Hey! Will try to have a look asap
Pinging @sanchit-gandhi for a review too!
Hi @ArthurZucker In have made the changes you requested except the resample one (I have mentioned the reason here), let me know if more changes are needed or not.
Hi @sanchit-gandhi thanks for your comments! and sorry for the late reply, the batching is not working for these reasons -
- feature_extractor - The output of the feature extractor varies from music to music! For example - music1.mp3(10 seconds long) can have a feature extractor output of shape - [7, 38, 512] and music2.mp3(also 10 seconds long) can have a feature extractor output of shape - [6, 42, 512] . So if a user tries to process multiple music files in batch it will be very hard to batch them!
-
truncation
- Truncating both of them to say [5, 25, 512] gives pretty bad results! -
padding
- One way we can overcome this is, we can take the maximum dimensions on each axis and pad them, but then we must unpad them or get their original shapes back, otherwise the the tokenizer won't work! So we can make a variable just to record the shapes of the tensors before padding. We can do this approach if you want.
I tried other approaches such as torch.nested.nested_tensor
but there the user wont be able to use .to("cuda") or .to("cpu")
on feature_extractor outputs because they are not supported!
- model - The model.generate can take inputs of (dim1, dim2, dim3) but as we have (dim1, dim2, dim3) shapes for each input we may need to use a for loop if we are to support batching!
Also should I make a new PR regarding the assert for T5?
Hi @sanchit-gandhi, I pushed the modifications, tests which are failing are mostly due to internal errors(Connection to HF hub, TF installation etc). Please ignore the all checkpoints as those are temporary. I will change all of them just before the merge. Also please tell me if any more modifications are needed or not and also if I have missed any or not.
And in meantime I will make a PR regarding the change of T5 assert to except, I think maybe we should wait until that gets merged and then we will change the blocks to except here too?
EDIT : pushed the change with the T5 modification.
pushed the change with the T5
modification.
Also requesting review from @hollance!
Hi @hollance, I have made those changes you requested. And Hi @sanchit-gandhi, please review the batching part(except the checkpoint part as we discussed in slack if want to move them to a separate org or not), let me know if more code changes are required or not.
btw I was automatically removed from slack channel as it says [email protected]’s Workspace’s free trial of Slack Pro has ended. Any channels with people outside your company — such as #pop2piano-integration — have been disconnected.
If anymore work is needed such as transferring the files to organization checkpoint, updating the HF Space for Pop2Piano ... please let me know I would be happy to do that!
@susnato
btw I was automatically removed from slack channel as it says
[email protected]’s Workspace’s free trial of Slack Pro has ended. Any channels with people outside your company — such as #pop2piano-integration — have been disconnected.
I added you back as a guest to the pop2piano-integration channel. You should be able to use this using regular Slack (not Pro). Let me know if that doesn't work.
Hi @sanchit-gandhi, I have pushed the new changes.
Alright nice! And you've verified that the outputs are the same with/without padding? Requesting review from @ArthurZucker and @hollance to kick-off the last round of reviews :)
yes I did check for 3 types - 1. single audio + no_attention_mask, 2.single audio + attention_mask, 3. 2 audios + attention_mask and the outputs were same. Since you just said, I will still add a test for that in the next commit(after last round of reviews).
reviewing right now
Hi @ArthurZucker I have pushed the comments. Let me know if any more changes are needed or not.
I'll @sanchit-gandhi review now, before pinging a core maintainer
I have transferred the necessary files to sweetcocoa/pop2piano
and updated the checkpoints. @sanchit-gandhi
Thanks for updating again, @sanchit-gandhi has his hands full, I'll review this weekend! Sorry for the delay and great work! 🔥
@ArthurZucker thanks for the quick reply, please don't worry about the delay, and thanks to the HF team for launching very intuitive audio course! :hugs:
Sorry, I missed the review request the first time! Rest assured it's on my list - I haven't forgotten @susnato! Aiming to have you a review either on Sunday or Monday afternoon latest 🙌
Hi @sanchit-gandhi, I have pushed the changes that you requested. Also answered the questions in the threads.
Cool thanks for the explanations @susnato, all good with me. Could you click "resolve" on all the threads that have been completed?
Hi @sanchit-gandhi, just resolved all other threads.