JImageHash icon indicating copy to clipboard operation
JImageHash copied to clipboard

Add random forest image matcher to utilize different image features

Open KilianB opened this issue 6 years ago • 2 comments

If we have labeled test data we can do better than directly comparing distances to guess if the images are duplicates or not.

With different hashing algorithms focusing on different criteria like color, gradient and frequency we might get better results using a simple technique like random forest.

https://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm

KilianB avatar Dec 03 '18 12:12 KilianB

A quick implementation will be added shortly. Which metric do we want to optimize? true positives? Gini impurity does not work in it's bare form due to the way test cases are generated from labeled images. We end up with highly unbalanced classes.

F1 looks promising at the moment.

Are there any slim random forest implementations available (preferably supporting the C4.5 algorithm)? Everything I have found so far will lead to an explosion of the dependency tree. ...

KilianB avatar Dec 05 '18 14:12 KilianB

8097890cc7ea448baf2031225f6e31996f3c78bd & 98ce751d85d01c35a11b9280ca90832280d25ab6 & 401fdd07dc3d796271a41911358bc25bf006e950

KilianB avatar Dec 06 '18 23:12 KilianB