dejavu icon indicating copy to clipboard operation
dejavu copied to clipboard

Ranking algorithm not working properly for files of different sizes

Open rstrobl opened this issue 3 years ago • 0 comments

Let's say I have the following original tracks:

Track A: 5 hours Track B: 3 minutes

Now I'm trying to match 10 seconds of audio from Track B. The current ranking algorithm will now favour Track A:

{'song_id': 144, 'song_name': 'TrackA.wav', 'input_total_hashes': 406, 'fingerprinted_hashes_in_db': 1, 'hashes_matched_in_input': 1621, 'input_confidence': 3.99, 'fingerprinted_confidence': 1621.0, 'offset': 719479, 'offset_seconds': 33412.5395, 'file_sha1': 'A64696103620CAD306B320F64CED8749033B84F9', 'length': 11543}

As you can see there is an input confidence of 4, which means in average each single hash has matches 4 times here. As the file is huge it's very likely that fingerprints will match at some point (at least once), which distorts the ranking.

Suggestion:

https://github.com/worldveil/dejavu/blob/e56a4a221ad204654a191d217f92aebf3f058b62/dejavu/init.py#L197

In this line there is an argument that is being completely ignored, which is aligned_matches. I think that aligned_matches should play a major role for the ranking.

rstrobl avatar Jun 28 '21 23:06 rstrobl