librosa icon indicating copy to clipboard operation
librosa copied to clipboard

[ENH]: cqt/vqt infer n_bins

Open bmcfee opened this issue 7 months ago • 4 comments

Problem

A minor annoyance when working with cqt and related functions is that the n_bins and bins_per_octave parameters are not directly tied. This means that if we want to change the frequency resolution but keep the same range, we need to change two parameters at once.

Proposed solution

We could allow n_bins=None as an indicator to calculate the maximum number of bins that will still fit (comfortably, including passband limits) under Nyquist. "Comfortably" here is doing a bit of work, as this calculation will depend on various other parameters of the filter bank (window function, filter scale), but it's entirely doable.

I see no reason not to add this functionality.

I could also be persuaded to make it the default behavior in a future release (1.1?).

bmcfee avatar Jul 29 '25 14:07 bmcfee

Interesting discussion. At the risk of opening a debate, what about introducing n_octaves? To your remark:

This means that if we want to change the frequency resolution but keep the same range, we need to change two parameters at once.

This could be achieved by changing bins_per_octave while leaving n_octaves unchanged. And n_bins would predictably be equal to bins_per_octave * n_octaves. I feel like knowing the number of bins is important, particularly in machine learning applications where tensor shapes must be knowable in advance.

Alternatively, n_octaves parameter could also be expressed as an fmax, similarly to melspectrogram https://librosa.org/doc/0.11.0/generated/librosa.filters.mel.html#librosa.filters.mel

The advantage is that fmax could default to sr//2. The disadvantage is that the output shape would not be too easy to predict by the user from input arguments (unless we added a utility function to get n_bins from those arguments).

lostanlen avatar Aug 08 '25 21:08 lostanlen

At the risk of opening a debate, what about introducing n_octaves?

I think this would be harder to pull off while preserving a backward-compatible API, since it would then become possible to have contradictory settings of fmin, n_bins, bins_per_octave, and n_octaves.

A slightly subtler problem is that it invites errors due to rounding. I think it's reasonable for users to expect a whole number of octaves, but fractional octaves are totally valid as well, and I don't much like the idea of having to round and guess a fractional number of octaves to figure out how many filters to construct.

And n_bins would predictably be equal to bins_per_octave * n_octaves. I feel like knowing the number of bins is important, particularly in machine learning applications where tensor shapes must be knowable in advance.

I 100% agree. However, I think it's still basically fine in the API I proposed above, as n_bins=None behavior would be fully specified by the other CQT parameters. (Probably there could be a helper function to let a user derive this without having to compute a CQT on a real signal, as you suggest.)

The advantage is that fmax could default to sr//2.

Ah, but we can't really set fmax to sr/2 because we also need to account for the filter bandwidth, and that depends on several other parameters (window, filter_scale, as well as the frequency grid parameters).

bmcfee avatar Aug 09 '25 01:08 bmcfee

OK thanks for replying. Indeed having a fractional number of octaves is a common thing to want. (examples: 7¼ for some pianos, 3¾ for some guitars ...)

I think the take-home message is

there could be a helper function to let a user derive this without having to compute a CQT on a real signal

via some kind of "dry run" of the CQT/VQT filterbank construction

lostanlen avatar Aug 09 '25 12:08 lostanlen

Exactly. We in fact already do something like this for calculating the filter lengths before constructing the filters.

bmcfee avatar Aug 09 '25 12:08 bmcfee