Olaf icon indicating copy to clipboard operation
Olaf copied to clipboard

Question: JAVA and C FP extraction difference

Open konstantin-sancom opened this issue 1 year ago • 1 comments

Hi, @JorenSix .

I tryed to compare JAVA and C FP algorithmus ans found out aparts of code in C version:

    uint64_t m1LargerThanm2 = m1 > m2 ? 1 : 0;
    uint64_t m2LargerThanm3 = m2 > m3 ? 1 : 0;
    uint64_t m3LargerThanm1 = m3 > m1 ? 1 : 0;

    m1LargerThanm2 = 0;
    m2LargerThanm3 = 0;
    m3LargerThanm1 = 0;

So in C version mXLargerThenmY is zeroed, but in JAVA it's not. Why? A BUG or a feature?

konstantin-sancom avatar Feb 09 '23 11:02 konstantin-sancom

To be honest I do not remember why I have zeroed these values. Did it impede retrieval? I do not know.

It is difficult to balance discriminability and retrievability of hashes. If you include too much information in a hash query fingerprints might not match. If you include too little information too many hashes match and performance suffers but it might help with retrieval rates. So it is a trade off between your application, the amount of noise you expect (digital near-duplicates or over the air queries), the query performance speed you are after, the size of the index you plan to use, ...

The proper way to do this is to make it configurable a config->useMagnitudesInFPHash = true/false for example. I will mark this as an enhancement.

It would be best to do a test that compares a database with and one without using magnitudes for your application. If you could share results or findings that would be great!

JorenSix avatar Feb 09 '23 12:02 JorenSix