JedAIToolkit
JedAIToolkit copied to clipboard
Unable to achieve high recall and high precision for the bigger datasets
Hello,
In the entity matching step I'm trying to combine different bag models with similarity measures for the dirty dataset "movies" in the data folder.
Unfortunately I'm unable to get high recall and high precision, could you give a good "recipe" to get good results for that dataset?
Thank you
Hi,
you can find an example of how to optimize the performance for the Dirty ER movies dataset here: https://github.com/scify/JedAIToolkit/blob/mavenizedVersion/jedai-core/src/test/java/org/scify/jedai/configuration/OptimizeDirtyMoviesDataset.java . Just make sure you unzip the profiles in the data folder.
Kind regards, George
Hi George, something isn't working as it should. Launching the algorithm, filtering for "F-Measure" string, I obtain the following
gabriele@aroxy:/media/gabriele/DATA/Universita/Tesi/tool/Jedai/JedAIToolkit$ java -jar jedai-core/target/jedai-core-1.3.jar | grep F-Measure
F-Measure : 0.0
F-Measure : 0.06148590947907771
F-Measure : 0.021220159151193636
F-Measure : 0.0
F-Measure : 0.05328596802841918
F-Measure : 0.011299435028248586
F-Measure : 0.0
F-Measure : 0.01474201474201474
F-Measure : 0.0
F-Measure : 0.02127659574468085
F-Measure : 0.011834319526627219
F-Measure : 0.0350109409190372
F-Measure : 0.005602240896358543
F-Measure : 0.02952029520295203
F-Measure : 0.05687203791469195
F-Measure : 0.07124681933842239
F-Measure : 0.18016378525932666
F-Measure : 0.005586592178770949
F-Measure : 0.031413612565445025
F-Measure : 0.0
F-Measure : 0.026373626373626374
F-Measure : 0.005633802816901408
F-Measure : 0.017897091722595078
F-Measure : 0.0
F-Measure : 0.01550387596899225
F-Measure : 0.042194092827004225
F-Measure : 0.04979253112033195
F-Measure : 0.0
F-Measure : 0.24346405228758167
F-Measure : 0.013856812933025405
F-Measure : 0.1213235294117647
F-Measure : 0.00396039603960396
F-Measure : 0.09782608695652174
F-Measure : 0.0056179775280898875
F-Measure : 0.005221932114882507
F-Measure : 0.015228426395939085
F-Measure : 0.022172949002217297
F-Measure : 0.1769087523277467
F-Measure : 0.0056179775280898875
F-Measure : 0.05758157389635317
F-Measure : 0.037578288100208766
F-Measure : 0.10510948905109489
F-Measure : 0.021220159151193636
F-Measure : 0.010075566750629721
F-Measure : 0.0
F-Measure : 0.12126537785588755
F-Measure : 0.030769230769230767
F-Measure : 0.0
F-Measure : 0.09671848013816925
F-Measure : 0.026246719160104987
F-Measure : 0.005333333333333333
F-Measure : 0.026373626373626374
F-Measure : 0.02727272727272727
F-Measure : 0.005747126436781609
F-Measure : 0.04519774011299435
F-Measure : 0.07692307692307693
F-Measure : 0.029268292682926828
**Best F-Measure** : 0.24346405228758167
F-Measure : 0.06531881804043545
F-Measure : 0.33240997229916897
F-Measure : 0.7068723702664796
F-Measure : 0.23611111111111108
F-Measure : 0.16641813301521025
F-Measure : 0.07221431344635693
F-Measure : 0.5612343297974927
F-Measure : 0.10637480798771122
F-Measure : 0.7549407114624507
F-Measure : 0.051971127151582454
F-Measure : 0.5468904244817374
F-Measure : 0.060599502218374644
F-Measure : 0.12273120138288679
F-Measure : 0.1254868022501082
F-Measure : 0.07572383073496658
F-Measure : 0.23976608187134502
F-Measure : 0.29317507418397626
F-Measure : 0.3779527559055118
F-Measure : 0.8632326820603907
F-Measure : 0.21333333333333332
F-Measure : 0.5058087578194816
F-Measure : 0.21828908554572274
F-Measure : 0.5579999999999999
F-Measure : 0.6247544204322201
F-Measure : 0.1894150417827298
F-Measure : 0.045255720053835796
F-Measure : 0.02837542874961023
F-Measure : 0.6962233169129721
F-Measure : 0.5479082321187584
F-Measure : 0.04672669749330738
F-Measure : 0.026598271112377694
F-Measure : 0.8484848484848486
F-Measure : 0.016411253430924064
F-Measure : 0.0505996673378272
F-Measure : 0.7314578005115091
F-Measure : 0.6907775768535263
F-Measure : 0.21588749524895479
F-Measure : 0.7608069164265131
F-Measure : 0.028737358566135976
F-Measure : 0.5949895615866388
F-Measure : 0.8810289389067525
F-Measure : 0.7853403141361257
F-Measure : 0.6308724832214766
F-Measure : 0.473063973063973
F-Measure : 0.3970223325062035
F-Measure : 0.07230422817112833
F-Measure : 0.036293683873036234
F-Measure : 0.5448979591836733
F-Measure : 0.596949891067538
F-Measure : 0.581986143187067
F-Measure : 0.4731543624161074
F-Measure : 0.5520361990950227
F-Measure : 0.3736842105263158
F-Measure : 0.05519230769230769
F-Measure : 0.1483679525222552
F-Measure : 0.5922077922077922
F-Measure : 0.4793152639087018
F-Measure : 0.14285714285714288
F-Measure : 0.4455066921606119
F-Measure : 0.6817248459958931
F-Measure : 0.46865671641791046
**Best F-Measure** : 0.8810289389067525
F-Measure : 0.8810289389067525
F-Measure : 0.13047732956398825
F-Measure : 0.11595155898953366
F-Measure : 0.042819724404965266
F-Measure : 0.22642479058533327
F-Measure : 0.23221586263287
F-Measure : 0.10016565433462175
F-Measure : 0.23370924121038936
F-Measure : 0.05917226582349951
F-Measure : 0.07644096250699497
F-Measure : 0.0655110310670869
F-Measure : 0.114667836000877
F-Measure : 0.004642256136482331
F-Measure : 0.13254834179539807
F-Measure : 0.03248684511553421
F-Measure : 0.016043397968605724
F-Measure : 0.2066214185793482
F-Measure : 0.19158976510067113
F-Measure : 0.21924444673504098
F-Measure : 0.03669933895600638
F-Measure : 0.18115597783056211
F-Measure : 0.1830387580636702
F-Measure : 0.007420289855072463
F-Measure : 0.22951154710811364
F-Measure : 0.11181342632955536
F-Measure : 0.0701813486047948
F-Measure : 0.20964230171073095
F-Measure : 0.18857053061652088
F-Measure : 0.21968997022892928
F-Measure : 0.25012647981382174
F-Measure : 0.17085427135678394
F-Measure : 0.15634139856421164
F-Measure : 0.2163971572767535
F-Measure : 0.2221075978748646
F-Measure : 0.22456320657759507
F-Measure : 0.23606590724165988
F-Measure : 0.2014095536413469
F-Measure : 0.10430664170062783
F-Measure : 0.20771574652688118
F-Measure : 0.2076253626191463
F-Measure : 0.029182879377431907
F-Measure : 0.21359323432343233
F-Measure : 0.0025572474718121587
F-Measure : 0.08629893238434164
F-Measure : 0.11281268733990953
F-Measure : 0.17873733108108109
F-Measure : 0.17257546225570247
F-Measure : 0.19978046103183317
F-Measure : 0.20251193689018063
F-Measure : 0.012711619575894147
F-Measure : 0.02595620604882511
F-Measure : 0.010996006713351467
F-Measure : 0.026391279403327594
F-Measure : 0.028059325430911074
F-Measure : 0.10585969738651993
F-Measure : 0.23020550402295906
F-Measure : 0.20560844909213882
F-Measure : 0.2730715567071956
F-Measure : 0.21276153886091065
**Best F-Measure** : 0.2730715567071956
Exception in thread "main" java.lang.NullPointerException
at org.scify.jedai.blockprocessing.comparisoncleaning.WeightedEdgePruning.setNumberedRandomConfiguration(WeightedEdgePruning.java:150)
at org.scify.jedai.workflowbuilder.Main.main(Main.java:170)
It found 3 different best F-Measure and ends with an exception.