torch-audiomentations icon indicating copy to clipboard operation
torch-audiomentations copied to clipboard

Implement time stretch transform

Open iver56 opened this issue 4 years ago • 13 comments

iver56 avatar Dec 10 '20 14:12 iver56

this sounds like something that wouldn't be too bad to work on, since pitch shifting is half time transform anyway!

KentoNishi avatar Jul 10 '21 20:07 KentoNishi

Agreed. A contribution would be welcome :)

iver56 avatar Jul 10 '21 21:07 iver56

https://github.com/KentoNishi/torch-time-stretch

iver56 avatar Oct 11 '21 06:10 iver56

I might do this when I'm free! I'm a little (severe understatement) short on time at the moment but it shouldn't be too bad to implement (famous last words)

KentoNishi avatar Oct 11 '21 07:10 KentoNishi

That's cool :) Btw, here's a related idea: A transform that is a combination of time stretching and pitch shifting, but does it in one operation, so it gets roughly the same execution time as time stretching

iver56 avatar Oct 11 '21 07:10 iver56

wow i would not have thought of that one myself, that's quite genius 🧠

KentoNishi avatar Oct 11 '21 07:10 KentoNishi

I'm glad you liked my idea ^^ Should I create an issue for that, then?

iver56 avatar Oct 11 '21 07:10 iver56

Yep!

roses are red pitch-shift was merged to HEAD time-stretch separately? why not both instead?

KentoNishi avatar Oct 11 '21 07:10 KentoNishi

The idea is ambitious, new and deep, But we also have other promises to keep. The issue number is #101, Let's hope it gets picked up by someone. Too bad that we are limited on time, but at least we have time for a rhyme

iver56 avatar Oct 11 '21 07:10 iver56

oh my god this is beautiful lmaooooooooooooooooooooooooooo i love it

KentoNishi avatar Oct 11 '21 07:10 KentoNishi

Hi @iver56! First of all, really nice that you are maintaining this project :) I work with audio AI models a lot and use audiomentations for many of them.

Since TimeStretch doesn't exist yet, following torch-time-stretch and audiomentations, I implemented a class for it. Since this transform changes the length of audio, this snippet of code from core.transforms_interface raises an error:

if self.mode == "per_example":
    if not self.are_parameters_frozen:
        self.randomize_parameters(selected_samples, sample_rate)

    cloned_samples[    # <--- 
        self.transform_parameters["should_apply"]
    ] = self.apply_transform(selected_samples, sample_rate)

Error: RuntimeError: shape mismatch: value tensor of shape [xxx] cannot be broadcast to indexing result of shape [1, 1, yyy]

How would you address this? Thank you in advance

akashrajkn avatar Dec 31 '21 09:12 akashrajkn

Hey :) I will check this out in a few days.

Nice profile pic btw ^^

iver56 avatar Dec 31 '21 18:12 iver56

Thanks!

I've submitted a PR (it is not complete) so you can view the code

akashrajkn avatar Jan 02 '22 09:01 akashrajkn