dejavu
dejavu copied to clipboard
Possible bug in align_matches function
Hi.
I found a possible bug in align_matches
function of Dejavu class. (dejavu/dejavu/init.py)
In the line 208, the function assigns dedup_hashes[song_id]
to hashes_matched
.
dedup_hashes[song_id]
is a number of matched hashes of a song, which does not consider offset difference.
As I remember, python2 version of this project considered both song id and offset difference so that hashes_matched
contains only hashes with same song id and same offset difference.
I thought this could be an intended change but if you don't consider offset difference, hashes_matched
could exceed quried_hashes
and therefore, INPUT_CONFIDENCE
could exceed 1.
Since rows in the database (I only checked MySQL) are only restricted to have a unique pair (hash, song_id, offset)
, one of queried hashes can be matched multiple rows in the database.
For example, consider the case when there are (hash1, song_id1, offset1)
and (hash1, song_id1, offset2)
in the database and you query (hash1)
.
Same hashes in different offset exist and when I change the default overlap_ratio
to 0.9, I could see hashes_matched
could exceed quried_hashes
.
I can confirm that I have seen this bug and input confidence > 1