What does this PR do?

Adds Pop2Piano model to HuggingFace.

Fixes #20126

Before submitting

[ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
[x] Did you read the contributor guideline, Pull Request section?
[x] Was this discussed/approved via a Github issue or the forum? Please add a link to it if that's the case.
[ ] Did you make sure to update the documentation with your changes? Here are the documentation guidelines, and here are tips on formatting docstrings.
[ ] Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR.

@ArthurZucker

Feb 24 '23 13:02 susnato

Hi @ArthurZucker the implementation is almost ready(also tests) but I feel that the way I implemented this model is not a descent level, thats why I want you to take a look at the Pop2PianoModel structure. Just to be clear with the feature extractor, the Pop2PianoFeatureExtractor takes raw_audio as input and generates variable length output(10, 50000, 15, 62200), even if I pad the raw_audio at start, it will still produce different results for different audio files, so I used lists to stack them and then wrapped them through BatchFeature.

Please don't mind about docs I will change them afterwards

EDIT : Please ignore this

Mar 13 '23 15:03 susnato

(Here is the author of pop2piano) Thank you for doing this PR. It seems that this was implemented by understanding the original code better than me! Please feel free to ask me if there is anything I can check or do.

Mar 19 '23 13:03 sweetcocoa

@sweetcocoa Thanks for you comments, HF team has helped me a lot in this integration.

Mar 19 '23 15:03 susnato

For solving the import issues, you have to create a require_xxx with the name of the package. Look for example at the require_accelerate in the testing_utils.py! 😉

Mar 20 '23 11:03 ArthurZucker

Hi @ArthurZucker thanks for you comment! But I have already created require_xxx in testing_utils.py regarding essentia and pretty_midi and also I have used them in transformers/src/transformers/models/pop2piano/__init__.py.

Mar 20 '23 12:03 susnato

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

Mar 23 '23 06:03 HuggingFaceDocBuilderDev

Hi @ArthurZucker sorry for the delay but the tests are green now! please review it.

Mar 23 '23 12:03 susnato

Hi @ArthurZucker , sorry for the huge delay, I have made most of the changes that you asked. Also there are some changes that I didn't do these are below -

I managed to remove dependency on soundfile and torchaudio but not librosa, since raw_audio is used 2 times first time in extract_rhythm which takes audio with original sampling_rate and the second time in single_preprocess which first upscales/downscales raw_audio to sampling_rate of 22050 and then uses it. And preloading raw_audio with sampling_rate of 22050(not with native sampling_rate) was giving very bad results! I tried to use scipy.resample but since it uses fft it is relatively slow and less accurate.
As you suggested to pad the feature_extractor outputs with silence, I tried to do that but I found that different audio files with same length have input_features of different shapes! For example one was having shape of [7, 38, 512] and another one of [6, 42, 512], both were 10s audios. I could pad them and use them in a batch but then I need to keep track of their shapes which would introduce another variable, what do you suggest? Should I leave them or try to pad and keep track of shapes?

The licensing page of essentia says - "Essentia is available under an open license, Affero GPLv3, for non-commercial applications, thus it is possible to test the library before deciding to licence it under a comercial licence." I don't know much about licensing so I will leave it upto you to decide what to change in the headings.

Also please forgive me if I missed something, I will change them in the future commit.

Apr 10 '23 11:04 susnato

Hey! Will try to have a look asap

Apr 11 '23 17:04 ArthurZucker

Pinging @sanchit-gandhi for a review too!

Apr 24 '23 14:04 ArthurZucker

Hi @ArthurZucker In have made the changes you requested except the resample one (I have mentioned the reason here), let me know if more changes are needed or not.

Apr 28 '23 19:04 susnato

Hi @sanchit-gandhi thanks for your comments! and sorry for the late reply, the batching is not working for these reasons -

feature_extractor - The output of the feature extractor varies from music to music! For example - music1.mp3(10 seconds long) can have a feature extractor output of shape - [7, 38, 512] and music2.mp3(also 10 seconds long) can have a feature extractor output of shape - [6, 42, 512] . So if a user tries to process multiple music files in batch it will be very hard to batch them!

truncation - Truncating both of them to say [5, 25, 512] gives pretty bad results!
padding - One way we can overcome this is, we can take the maximum dimensions on each axis and pad them, but then we must unpad them or get their original shapes back, otherwise the the tokenizer won't work! So we can make a variable just to record the shapes of the tensors before padding. We can do this approach if you want.

I tried other approaches such as torch.nested.nested_tensor but there the user wont be able to use .to("cuda") or .to("cpu") on feature_extractor outputs because they are not supported!

model - The model.generate can take inputs of (dim1, dim2, dim3) but as we have (dim1, dim2, dim3) shapes for each input we may need to use a for loop if we are to support batching!

Also should I make a new PR regarding the assert for T5?

May 06 '23 05:05 susnato

Hi @sanchit-gandhi, I pushed the modifications, tests which are failing are mostly due to internal errors(Connection to HF hub, TF installation etc). Please ignore the all checkpoints as those are temporary. I will change all of them just before the merge. Also please tell me if any more modifications are needed or not and also if I have missed any or not.

And in meantime I will make a PR regarding the change of T5 assert to except, I think maybe we should wait until that gets merged and then we will change the blocks to except here too?

EDIT : pushed the change with the T5 modification.

May 08 '23 20:05 susnato

pushed the change with the T5 modification.

May 14 '23 16:05 susnato

Also requesting review from @hollance!

May 16 '23 17:05 sanchit-gandhi

Hi @hollance, I have made those changes you requested. And Hi @sanchit-gandhi, please review the batching part(except the checkpoint part as we discussed in slack if want to move them to a separate org or not), let me know if more code changes are required or not. btw I was automatically removed from slack channel as it says [email protected]’s Workspace’s free trial of Slack Pro has ended. Any channels with people outside your company — such as #pop2piano-integration — have been disconnected.
If anymore work is needed such as transferring the files to organization checkpoint, updating the HF Space for Pop2Piano ... please let me know I would be happy to do that!

May 21 '23 13:05 susnato

@susnato

btw I was automatically removed from slack channel as it says [email protected]’s Workspace’s free trial of Slack Pro has ended. Any channels with people outside your company — such as #pop2piano-integration — have been disconnected.

I added you back as a guest to the pop2piano-integration channel. You should be able to use this using regular Slack (not Pro). Let me know if that doesn't work.

May 22 '23 09:05 hollance

Hi @sanchit-gandhi, I have pushed the new changes.

Jun 06 '23 08:06 susnato

Alright nice! And you've verified that the outputs are the same with/without padding? Requesting review from @ArthurZucker and @hollance to kick-off the last round of reviews :)

Jun 06 '23 17:06 sanchit-gandhi

yes I did check for 3 types - 1. single audio + no_attention_mask, 2.single audio + attention_mask, 3. 2 audios + attention_mask and the outputs were same. Since you just said, I will still add a test for that in the next commit(after last round of reviews).

Jun 06 '23 17:06 susnato

reviewing right now

Jun 07 '23 15:06 ArthurZucker

Hi @ArthurZucker I have pushed the comments. Let me know if any more changes are needed or not.

Jun 08 '23 18:06 susnato

I'll @sanchit-gandhi review now, before pinging a core maintainer

Jun 09 '23 09:06 ArthurZucker

I have transferred the necessary files to sweetcocoa/pop2piano and updated the checkpoints. @sanchit-gandhi

Jun 23 '23 14:06 susnato

Thanks for updating again, @sanchit-gandhi has his hands full, I'll review this weekend! Sorry for the delay and great work! 🔥

Jun 23 '23 15:06 ArthurZucker

@ArthurZucker thanks for the quick reply, please don't worry about the delay, and thanks to the HF team for launching very intuitive audio course! :hugs:

Jun 23 '23 17:06 susnato

Sorry, I missed the review request the first time! Rest assured it's on my list - I haven't forgotten @susnato! Aiming to have you a review either on Sunday or Monday afternoon latest 🙌

Jun 23 '23 17:06 sanchit-gandhi

Hi @sanchit-gandhi, I have pushed the changes that you requested. Also answered the questions in the threads.

Jul 03 '23 07:07 susnato

Cool thanks for the explanations @susnato, all good with me. Could you click "resolve" on all the threads that have been completed?

Jul 03 '23 15:07 sanchit-gandhi

Hi @sanchit-gandhi, just resolved all other threads.

Jul 03 '23 16:07 susnato

transformers
transformers copied to clipboard

Add Pop2Piano

What does this PR do?

Before submitting

Who can review?

transformers transformers copied to clipboard

Add Pop2Piano

What does this PR do?

Before submitting

Who can review?

transformers
transformers copied to clipboard