mir_eval chord.eval: expose extension-reduction parameters

(tagging @tomxi @ejhumphrey)

The first step taken in the chord metrics is to map chord strings onto pitch/root/bass encodings, for example, here. Currently, this hard-codes the reduce_extended_chords parameter to False (second argument). This maintains compatibility with mirex by default, but there is no way for a user to override this behavior. This is problematic for examples like the following (due to @tomxi):

In [1]: import mir_eval

In [2]: c1 = 'C#:7(b9,#9)'

In [3]: c2 = 'C#:min7(b9,b11)'

In [4]: mir_eval.chord.encode(c1)
Out[4]: (1, array([1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0]), 0)

In [5]: mir_eval.chord.encode(c2)
Out[5]: (1, array([1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0]), 0)

In [6]: mir_eval.chord.encode(c1, reduce_extended_chords=True)
Out[6]: (1, array([1, 1, 0, 1, 1, 0, 0, 1, 0, 0, 1, 0]), 0)

In [7]: mir_eval.chord.encode(c2, reduce_extended_chords=True)
Out[7]: (1, array([1, 1, 0, 1, 1, 0, 0, 1, 0, 0, 1, 0]), 0)

where the chords c1 and c2 have identical pitch class content if extensions are included (last two encodings) but differ within the octave (first two encodings).

What do folks think about exposing the reduction parameter at the API level, so that users can decide whether or not to include extensions? @ejhumphrey do you anticipate any backfiring here?

May 04 '18 13:05 bmcfee

Assuming there is a use-case for evaluation using reduce_extended_chords=True, I am in support of this. I don't know enough about chords to know whether this is a common or possible use-case.

May 04 '18 15:05 craffel

It's not common (it's not even possible right now :grin:) but I think there's no harm in exposing this parameter and preserving the default behavior.

Down the road, I could imagine expanding the eval results to include _ext metrics similar to how we include _inv metrics (for inversion-sensitive evaluation).

May 04 '18 15:05 bmcfee

Just to add: This chord actually happened in our dataset. Here's the played chord as represented by tablature.

I feel like similar situations can arise pretty easily in Jazz, with extended chords. A lot of times they have deliberately confusing chord types.

Another classic example is this: German augmented 6th chord is enharmonically the same as a dominant 7th chord. The only way to tell them apart is to look at the context (ie how they resolve.)

~~Chopin in his Ab major prelude deliberately used this ambiguity.~~ Wrong example... Too much classical pieces mixed up in my head

May 04 '18 17:05 tomxi

Almost done implementing this, and ran into a couple of interesting points. I expected, since the PCPs are equivalent, that they should get perfect scores on all metrics. This turns out to not be correct:

The majmin(_inv) metrics give 0 score here, even with extension reduction. It turns out that majmin gets this right, but in the wrong way. Specifically, it checks for exact quality agreement in the first 8 bins against the maj or min templates. Both chords fail on this, so it gets flagged as a "no-chord".

You could make the case that this chord is neither major nor minor, but the fact that one is notated as a dom7 (major-like) and the other as min7 (minor) seems like some evidence that they should be interpreted differently. However, that would require rewriting the majmin metrics to be more quality-aware, and I'm not exactly keen to do that. I'm fine with leaving it as is.
The sevenths(_inv) metrics also give 0 score. I was looking into the code, and this part (7th qualities)[https://github.com/craffel/mir_eval/blob/master/mir_eval/chord.py#L1275] seems weird. It's missing dim7 and hdim7 qualities... i wonder if this is a bug? (Note, that's independent of the current issue)

Anyway, the problem here is that it's looking for exact semitone agreement with one of the known qualities, which of course we don't have in either of these chords, but the fact that they're weird in the same way suggests that this behavior is wrong. (Maybe @ejhumphrey or @mattmcvicar will fight me on this? :grin:)

I think the above can be resolved generally by relaxing the bin comparison strategy. Rather than seeing if the chord(s) in question match any of the known templates, we can instead specify a mask vector m that indicates which semitones matter for the evaluation in question (in this case, root + m3, M3, TT, P5, m7, M7), and then check if ref_semitones * m == est_semitones * m.

I've implemented that kind of logic before in pumpp's chord vocabulary reduction algorithm, and afaik it's correct. It would probably change the behavior of the metrics, even when extension reduction is disabled, so it's worth having a conversation about.

IMO, the current behavior is incorrect/not expected/probably a bug, but I'm open to other interpretations.

May 04 '18 21:05 bmcfee

mir_eval mir_eval copied to clipboard

chord.eval: expose extension-reduction parameters

mir_eval
mir_eval copied to clipboard