qdft icon indicating copy to clipboard operation
qdft copied to clipboard

Variable-Q transform and arbitrary frequency scale

Open TF3RDL opened this issue 2 years ago • 7 comments
trafficstars

Variable-Q transform is very similar to constant-Q transform except the Q value is lower as the frequency decreases, which is useful if you want to have better time resolution on lower frequencies at the expense of frequency resolution at bass frequencies (like a logarithmic-frequency spectrogram with ERB frequency resolution).

Arbitrary frequency scale for bin spacing also enables variable-Q transform and with that, it can directly calculate Mel spectrogram using VQT with Mel-frequency bin spacing and resolution.

TF3RDL avatar Apr 26 '23 00:04 TF3RDL

This is not an issue. The name of this project is "Constant-Q Sliding DFT", which is also the main purpose.

jurihock avatar Apr 26 '23 20:04 jurihock

Actually, I thought about variable bin bandwidth again. In my current project, an improved time resolution at low frequencies would be beneficial. However, I do not plan to use a frequency scale other than logarithmic in this repo.

According to [1]:

Auditory filters in the human auditory system are approximately constant-Q only for frequencies above 500 Hz and smoothly approach a constant bandwidth towards lower frequencies. Accordingly, music signals generally do not contain closely spaced pitches at low frequencies, thus the Q-factors (relative frequency resolution) can safely be reduced towards lower frequencies, which in turn improves the time resolution.

...the variable bin bandwidth can be mapped via parameter gamma like in [2].

According to [3], the windowing procedure remains the same as in the constant Q case, where gamma equals to 0. Altough an additional memory access to particular bin fiddles is required, I don't expect a huge performance drawback.

[1] A Matlab Toolbox for Efficient Perfect Reconstruction Time-Frequency Transforms with Log-Frequency Resolution [2] librosa.vqt [3] Sliding with a constant Q

jurihock avatar Aug 31 '23 13:08 jurihock

I have yet to see good use cases of arbitrary bin spacing (which is a special case of VQT and it is closely related to long-term variable-Q transform paper) as last time, I've played around with my VQ-sDFT implementation on my spectrogram sketch to generate Mel spectrogram with this algorithm: mel spectrogram using sdft

TF3RDL avatar Sep 05 '23 05:09 TF3RDL

I suppose it will not be too hard to perform a sliding DFT with arbitrary frequency bin spacing and arbitrary bin bandwidth as well based on the current QDFT implementation. However, a dedicated repository would be appropriate for this purpose. The explicit logarithmic frequency scale also has its place.

jurihock avatar Sep 06 '23 21:09 jurihock

Exactly it is not hard to do something like this (relevant section is when using CQT part of this CodePen audio visualization, which uses Goertzel algorithm instead of sDFT, but it doesn't matter since any CQT/VQT implementation can be easily adapted to use arbitrary bin spacing and bandwidth), it is just logarithmic frequency scaling are more convenient since it follows musical scales whereas perceptual and other frequency scales are not

TF3RDL avatar Sep 06 '23 23:09 TF3RDL

BTW, my VQ-sDFT (alongside SWIFT) implementation with arbitrary frequency scale support is included in my first AudioWorklet-based audio visualization project over CodePen, for those who interested on variable-Q transform and arbitrary frequency audio spectrum analyzer

TF3RDL avatar Aug 14 '24 07:08 TF3RDL