lhotse
lhotse copied to clipboard
Reverb in frequency/STFT domain
I found an interesting paper that describes a method of implementing reverb in the STFT domain. Maybe with some tinkering it can be adapted to work with log-Mel filter energies. It would be another type of data augmentation that we can perform on-the-fly on the pre-computed features.
If anybody is interested in that I'd welcome contributions. I don't think I'll find the time to add it myself anytime soon.
I have contacted Earl Vickers the first author of the paper and he sad that it's not the best algo for rever augmentation. His answer was literally the following:
Sorry, I don't have any code available. If I recall, the sound quality wasn't great, because we didn't have good phase information.
Then I asked and he agreed that it's not a good production solution. So I suggest sticking with a good old rever algorithm in the raw wave domain unless you know some other method in the frequency domain (I searched and didn't find anything).
Thanks! It’s good to know. I think time-domain is fine, it seems on-the-fly audio reading and feature extraction is efficient enough in the setups we tried so far.
I don't think the audio quality would matter if we are just using frequency-domain features.
On Wed, Sep 1, 2021 at 2:10 AM Piotr Żelasko @.***> wrote:
Thanks! It’s good to know. I think time-domain is fine, it seems on-the-fly audio reading and feature extraction is efficient enough in the setups we tried so far.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/lhotse-speech/lhotse/issues/200#issuecomment-909478845, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZFLO4MP4QLVVMMKZVHPDLT7ULKFANCNFSM4XW2UPWQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
As far as I understand the author was referring to the quality of the reverberated signal. He mentioned the absence phase information problem which is described in Section 5 of the paper.
I don't think the audio quality would matter if we are just using frequency-domain features. … On Wed, Sep 1, 2021 at 2:10 AM Piotr Żelasko @.***> wrote: Thanks! It’s good to know. I think time-domain is fine, it seems on-the-fly audio reading and feature extraction is efficient enough in the setups we tried so far. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#200 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZFLO4MP4QLVVMMKZVHPDLT7ULKFANCNFSM4XW2UPWQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.