mir_eval
mir_eval copied to clipboard
chord.eval: expose extension-reduction parameters
(tagging @tomxi @ejhumphrey)
The first step taken in the chord metrics is to map chord strings onto pitch/root/bass encodings, for example, here. Currently, this hard-codes the reduce_extended_chords parameter to False (second argument). This maintains compatibility with mirex by default, but there is no way for a user to override this behavior. This is problematic for examples like the following (due to @tomxi):
In [1]: import mir_eval
In [2]: c1 = 'C#:7(b9,#9)'
In [3]: c2 = 'C#:min7(b9,b11)'
In [4]: mir_eval.chord.encode(c1)
Out[4]: (1, array([1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0]), 0)
In [5]: mir_eval.chord.encode(c2)
Out[5]: (1, array([1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0]), 0)
In [6]: mir_eval.chord.encode(c1, reduce_extended_chords=True)
Out[6]: (1, array([1, 1, 0, 1, 1, 0, 0, 1, 0, 0, 1, 0]), 0)
In [7]: mir_eval.chord.encode(c2, reduce_extended_chords=True)
Out[7]: (1, array([1, 1, 0, 1, 1, 0, 0, 1, 0, 0, 1, 0]), 0)
where the chords c1 and c2 have identical pitch class content if extensions are included (last two encodings) but differ within the octave (first two encodings).
What do folks think about exposing the reduction parameter at the API level, so that users can decide whether or not to include extensions? @ejhumphrey do you anticipate any backfiring here?
Assuming there is a use-case for evaluation using reduce_extended_chords=True, I am in support of this. I don't know enough about chords to know whether this is a common or possible use-case.
It's not common (it's not even possible right now :grin:) but I think there's no harm in exposing this parameter and preserving the default behavior.
Down the road, I could imagine expanding the eval results to include _ext metrics similar to how we include _inv metrics (for inversion-sensitive evaluation).
Just to add: This chord actually happened in our dataset. Here's the played chord as represented by tablature.

I feel like similar situations can arise pretty easily in Jazz, with extended chords. A lot of times they have deliberately confusing chord types.
Another classic example is this: German augmented 6th chord is enharmonically the same as a dominant 7th chord. The only way to tell them apart is to look at the context (ie how they resolve.)
~~Chopin in his Ab major prelude deliberately used this ambiguity.~~ Wrong example... Too much classical pieces mixed up in my head
Almost done implementing this, and ran into a couple of interesting points. I expected, since the PCPs are equivalent, that they should get perfect scores on all metrics. This turns out to not be correct:
-
The
majmin(_inv)metrics give 0 score here, even with extension reduction. It turns out that majmin gets this right, but in the wrong way. Specifically, it checks for exact quality agreement in the first 8 bins against themajormintemplates. Both chords fail on this, so it gets flagged as a "no-chord".You could make the case that this chord is neither major nor minor, but the fact that one is notated as a dom7 (major-like) and the other as min7 (minor) seems like some evidence that they should be interpreted differently. However, that would require rewriting the
majminmetrics to be more quality-aware, and I'm not exactly keen to do that. I'm fine with leaving it as is. -
The
sevenths(_inv)metrics also give 0 score. I was looking into the code, and this part (7th qualities)[https://github.com/craffel/mir_eval/blob/master/mir_eval/chord.py#L1275] seems weird. It's missingdim7andhdim7qualities... i wonder if this is a bug? (Note, that's independent of the current issue)Anyway, the problem here is that it's looking for exact semitone agreement with one of the known qualities, which of course we don't have in either of these chords, but the fact that they're weird in the same way suggests that this behavior is wrong. (Maybe @ejhumphrey or @mattmcvicar will fight me on this? :grin:)
I think the above can be resolved generally by relaxing the bin comparison strategy. Rather than seeing if the chord(s) in question match any of the known templates, we can instead specify a mask vector m that indicates which semitones matter for the evaluation in question (in this case, root + m3, M3, TT, P5, m7, M7), and then check if ref_semitones * m == est_semitones * m.
I've implemented that kind of logic before in pumpp's chord vocabulary reduction algorithm, and afaik it's correct. It would probably change the behavior of the metrics, even when extension reduction is disabled, so it's worth having a conversation about.
IMO, the current behavior is incorrect/not expected/probably a bug, but I'm open to other interpretations.