dejavu icon indicating copy to clipboard operation
dejavu copied to clipboard

Possible bug in align_matches function

Open nykim11 opened this issue 4 years ago • 1 comments

Hi.

I found a possible bug in align_matches function of Dejavu class. (dejavu/dejavu/init.py)

In the line 208, the function assigns dedup_hashes[song_id] to hashes_matched.

dedup_hashes[song_id] is a number of matched hashes of a song, which does not consider offset difference.

As I remember, python2 version of this project considered both song id and offset difference so that hashes_matched contains only hashes with same song id and same offset difference.

I thought this could be an intended change but if you don't consider offset difference, hashes_matched could exceed quried_hashes and therefore, INPUT_CONFIDENCE could exceed 1.

Since rows in the database (I only checked MySQL) are only restricted to have a unique pair (hash, song_id, offset), one of queried hashes can be matched multiple rows in the database.

For example, consider the case when there are (hash1, song_id1, offset1) and (hash1, song_id1, offset2) in the database and you query (hash1).

Same hashes in different offset exist and when I change the default overlap_ratio to 0.9, I could see hashes_matched could exceed quried_hashes.

nykim11 avatar Jul 28 '20 14:07 nykim11

I can confirm that I have seen this bug and input confidence > 1

raedatoui avatar Aug 13 '20 03:08 raedatoui