Panako icon indicating copy to clipboard operation
Panako copied to clipboard

PanakoStrategy Query Logic - allow duplicate fingerprint hash?

Open lucaslawes opened this issue 1 year ago • 1 comments

Possible minor refactoring to improve the recognition rate.

Testing Result Playing around with whether a duplicate fingerprint hash is processed produced an unexpected improvement in the recognition rate when duplicate fingerprints are not considered. However, this might not suit all use cases for the query algorithm.

Suggestion Add a boolean flag to allow duplicate fingerprints or not. See pseudocode below:

//query
for(PanakoFingerprint print : prints) {
	long hash = print.hash();
        hashNotADuplicate = // add duplicate logic  
        if(allowDuplicates || hashNotADuplicate) {
	    db.addToQueryQueue(hash);
        }
	printMap.put(hash, print);
}

lucaslawes avatar Aug 30 '22 09:08 lucaslawes

Hi thanks for the suggestion,

The reason for not allowing duplicate hashes is twofold:

If a hash is common it means (almost by definition) that it does not have much discriminative power. The idea implemented here is that they can be safely ignored.

Another reason is performance: not wasting storage space or computation on hashes with little discriminative power. While some hash collisions are allowed having too many could have an effect on query performance.

However, letting users choose would indeed be a good improvement. For small collections or powerful servers the collisions can perhaps be not that big of a problem. Either using a Set (to avoid duplicates) or an Array (to allow) to store temporary prints could be an idea indeed.

JorenSix avatar Sep 06 '22 09:09 JorenSix