lhotse icon indicating copy to clipboard operation
lhotse copied to clipboard

Reverb in frequency/STFT domain

Open pzelasko opened this issue 4 years ago • 4 comments

I found an interesting paper that describes a method of implementing reverb in the STFT domain. Maybe with some tinkering it can be adapted to work with log-Mel filter energies. It would be another type of data augmentation that we can perform on-the-fly on the pre-computed features.

If anybody is interested in that I'd welcome contributions. I don't think I'll find the time to add it myself anytime soon.

pzelasko avatar Feb 16 '21 17:02 pzelasko

I have contacted Earl Vickers the first author of the paper and he sad that it's not the best algo for rever augmentation. His answer was literally the following: Sorry, I don't have any code available. If I recall, the sound quality wasn't great, because we didn't have good phase information. Then I asked and he agreed that it's not a good production solution. So I suggest sticking with a good old rever algorithm in the raw wave domain unless you know some other method in the frequency domain (I searched and didn't find anything).

videodanchik avatar Aug 31 '21 17:08 videodanchik

Thanks! It’s good to know. I think time-domain is fine, it seems on-the-fly audio reading and feature extraction is efficient enough in the setups we tried so far.

pzelasko avatar Aug 31 '21 18:08 pzelasko

I don't think the audio quality would matter if we are just using frequency-domain features.

On Wed, Sep 1, 2021 at 2:10 AM Piotr Żelasko @.***> wrote:

Thanks! It’s good to know. I think time-domain is fine, it seems on-the-fly audio reading and feature extraction is efficient enough in the setups we tried so far.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/lhotse-speech/lhotse/issues/200#issuecomment-909478845, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZFLO4MP4QLVVMMKZVHPDLT7ULKFANCNFSM4XW2UPWQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

danpovey avatar Sep 01 '21 04:09 danpovey

As far as I understand the author was referring to the quality of the reverberated signal. He mentioned the absence phase information problem which is described in Section 5 of the paper.

I don't think the audio quality would matter if we are just using frequency-domain features. On Wed, Sep 1, 2021 at 2:10 AM Piotr Żelasko @.***> wrote: Thanks! It’s good to know. I think time-domain is fine, it seems on-the-fly audio reading and feature extraction is efficient enough in the setups we tried so far. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#200 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZFLO4MP4QLVVMMKZVHPDLT7ULKFANCNFSM4XW2UPWQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

videodanchik avatar Sep 01 '21 20:09 videodanchik