qdft
qdft copied to clipboard
Variable-Q transform and arbitrary frequency scale
Variable-Q transform is very similar to constant-Q transform except the Q value is lower as the frequency decreases, which is useful if you want to have better time resolution on lower frequencies at the expense of frequency resolution at bass frequencies (like a logarithmic-frequency spectrogram with ERB frequency resolution).
Arbitrary frequency scale for bin spacing also enables variable-Q transform and with that, it can directly calculate Mel spectrogram using VQT with Mel-frequency bin spacing and resolution.
This is not an issue. The name of this project is "Constant-Q Sliding DFT", which is also the main purpose.
Further reading:
- Improvements to the Sliding Discrete Fourier Transform Algorithm (section sliding spectrum analysis for a noninteger k analysis frequency)
- A Matlab Toolbox for Efficient Perfect Reconstruction Time-Frequency Transforms with Log-Frequency Resolution (section Variable-Q)
- Learnable Harmonic Variable-Q Transform (LHVQT) (see references)
Actually, I thought about variable bin bandwidth again. In my current project, an improved time resolution at low frequencies would be beneficial. However, I do not plan to use a frequency scale other than logarithmic in this repo.
According to [1]:
Auditory filters in the human auditory system are approximately constant-Q only for frequencies above 500 Hz and smoothly approach a constant bandwidth towards lower frequencies. Accordingly, music signals generally do not contain closely spaced pitches at low frequencies, thus the Q-factors (relative frequency resolution) can safely be reduced towards lower frequencies, which in turn improves the time resolution.
...the variable bin bandwidth can be mapped via parameter gamma like in [2].
According to [3], the windowing procedure remains the same as in the constant Q case, where gamma equals to 0. Altough an additional memory access to particular bin fiddles is required, I don't expect a huge performance drawback.
[1] A Matlab Toolbox for Efficient Perfect Reconstruction Time-Frequency Transforms with Log-Frequency Resolution [2] librosa.vqt [3] Sliding with a constant Q
I have yet to see good use cases of arbitrary bin spacing (which is a special case of VQT and it is closely related to long-term variable-Q transform paper) as last time, I've played around with my VQ-sDFT implementation on my spectrogram sketch to generate Mel spectrogram with this algorithm:
I suppose it will not be too hard to perform a sliding DFT with arbitrary frequency bin spacing and arbitrary bin bandwidth as well based on the current QDFT implementation. However, a dedicated repository would be appropriate for this purpose. The explicit logarithmic frequency scale also has its place.
Exactly it is not hard to do something like this (relevant section is when using CQT part of this CodePen audio visualization, which uses Goertzel algorithm instead of sDFT, but it doesn't matter since any CQT/VQT implementation can be easily adapted to use arbitrary bin spacing and bandwidth), it is just logarithmic frequency scaling are more convenient since it follows musical scales whereas perceptual and other frequency scales are not
BTW, my VQ-sDFT (alongside SWIFT) implementation with arbitrary frequency scale support is included in my first AudioWorklet-based audio visualization project over CodePen, for those who interested on variable-Q transform and arbitrary frequency audio spectrum analyzer