JedAIToolkit icon indicating copy to clipboard operation
JedAIToolkit copied to clipboard

Unable to achieve high recall and high precision for the bigger datasets

Open Murray1991 opened this issue 5 years ago • 2 comments

Hello,

In the entity matching step I'm trying to combine different bag models with similarity measures for the dirty dataset "movies" in the data folder.

Unfortunately I'm unable to get high recall and high precision, could you give a good "recipe" to get good results for that dataset?

Thank you

Murray1991 avatar Sep 17 '18 13:09 Murray1991

Hi,

you can find an example of how to optimize the performance for the Dirty ER movies dataset here: https://github.com/scify/JedAIToolkit/blob/mavenizedVersion/jedai-core/src/test/java/org/scify/jedai/configuration/OptimizeDirtyMoviesDataset.java . Just make sure you unzip the profiles in the data folder.

Kind regards, George

gpapadis avatar Sep 19 '18 10:09 gpapadis

Hi George, something isn't working as it should. Launching the algorithm, filtering for "F-Measure" string, I obtain the following

gabriele@aroxy:/media/gabriele/DATA/Universita/Tesi/tool/Jedai/JedAIToolkit$ java -jar jedai-core/target/jedai-core-1.3.jar | grep F-Measure
F-Measure	:	0.0
F-Measure	:	0.06148590947907771
F-Measure	:	0.021220159151193636
F-Measure	:	0.0
F-Measure	:	0.05328596802841918
F-Measure	:	0.011299435028248586
F-Measure	:	0.0
F-Measure	:	0.01474201474201474
F-Measure	:	0.0
F-Measure	:	0.02127659574468085
F-Measure	:	0.011834319526627219
F-Measure	:	0.0350109409190372
F-Measure	:	0.005602240896358543
F-Measure	:	0.02952029520295203
F-Measure	:	0.05687203791469195
F-Measure	:	0.07124681933842239
F-Measure	:	0.18016378525932666
F-Measure	:	0.005586592178770949
F-Measure	:	0.031413612565445025
F-Measure	:	0.0
F-Measure	:	0.026373626373626374
F-Measure	:	0.005633802816901408
F-Measure	:	0.017897091722595078
F-Measure	:	0.0
F-Measure	:	0.01550387596899225
F-Measure	:	0.042194092827004225
F-Measure	:	0.04979253112033195
F-Measure	:	0.0
F-Measure	:	0.24346405228758167
F-Measure	:	0.013856812933025405
F-Measure	:	0.1213235294117647
F-Measure	:	0.00396039603960396
F-Measure	:	0.09782608695652174
F-Measure	:	0.0056179775280898875
F-Measure	:	0.005221932114882507
F-Measure	:	0.015228426395939085
F-Measure	:	0.022172949002217297
F-Measure	:	0.1769087523277467
F-Measure	:	0.0056179775280898875
F-Measure	:	0.05758157389635317
F-Measure	:	0.037578288100208766
F-Measure	:	0.10510948905109489
F-Measure	:	0.021220159151193636
F-Measure	:	0.010075566750629721
F-Measure	:	0.0
F-Measure	:	0.12126537785588755
F-Measure	:	0.030769230769230767
F-Measure	:	0.0
F-Measure	:	0.09671848013816925
F-Measure	:	0.026246719160104987
F-Measure	:	0.005333333333333333
F-Measure	:	0.026373626373626374
F-Measure	:	0.02727272727272727
F-Measure	:	0.005747126436781609
F-Measure	:	0.04519774011299435
F-Measure	:	0.07692307692307693
F-Measure	:	0.029268292682926828
**Best F-Measure**	:	0.24346405228758167
F-Measure	:	0.06531881804043545
F-Measure	:	0.33240997229916897
F-Measure	:	0.7068723702664796
F-Measure	:	0.23611111111111108
F-Measure	:	0.16641813301521025
F-Measure	:	0.07221431344635693
F-Measure	:	0.5612343297974927
F-Measure	:	0.10637480798771122
F-Measure	:	0.7549407114624507
F-Measure	:	0.051971127151582454
F-Measure	:	0.5468904244817374
F-Measure	:	0.060599502218374644
F-Measure	:	0.12273120138288679
F-Measure	:	0.1254868022501082
F-Measure	:	0.07572383073496658
F-Measure	:	0.23976608187134502
F-Measure	:	0.29317507418397626
F-Measure	:	0.3779527559055118
F-Measure	:	0.8632326820603907
F-Measure	:	0.21333333333333332
F-Measure	:	0.5058087578194816
F-Measure	:	0.21828908554572274
F-Measure	:	0.5579999999999999
F-Measure	:	0.6247544204322201
F-Measure	:	0.1894150417827298
F-Measure	:	0.045255720053835796
F-Measure	:	0.02837542874961023
F-Measure	:	0.6962233169129721
F-Measure	:	0.5479082321187584
F-Measure	:	0.04672669749330738
F-Measure	:	0.026598271112377694
F-Measure	:	0.8484848484848486
F-Measure	:	0.016411253430924064
F-Measure	:	0.0505996673378272
F-Measure	:	0.7314578005115091
F-Measure	:	0.6907775768535263
F-Measure	:	0.21588749524895479
F-Measure	:	0.7608069164265131
F-Measure	:	0.028737358566135976
F-Measure	:	0.5949895615866388
F-Measure	:	0.8810289389067525
F-Measure	:	0.7853403141361257
F-Measure	:	0.6308724832214766
F-Measure	:	0.473063973063973
F-Measure	:	0.3970223325062035
F-Measure	:	0.07230422817112833
F-Measure	:	0.036293683873036234
F-Measure	:	0.5448979591836733
F-Measure	:	0.596949891067538
F-Measure	:	0.581986143187067
F-Measure	:	0.4731543624161074
F-Measure	:	0.5520361990950227
F-Measure	:	0.3736842105263158
F-Measure	:	0.05519230769230769
F-Measure	:	0.1483679525222552
F-Measure	:	0.5922077922077922
F-Measure	:	0.4793152639087018
F-Measure	:	0.14285714285714288
F-Measure	:	0.4455066921606119
F-Measure	:	0.6817248459958931
F-Measure	:	0.46865671641791046
**Best F-Measure**	:	0.8810289389067525
F-Measure	:	0.8810289389067525
F-Measure	:	0.13047732956398825
F-Measure	:	0.11595155898953366
F-Measure	:	0.042819724404965266
F-Measure	:	0.22642479058533327
F-Measure	:	0.23221586263287
F-Measure	:	0.10016565433462175
F-Measure	:	0.23370924121038936
F-Measure	:	0.05917226582349951
F-Measure	:	0.07644096250699497
F-Measure	:	0.0655110310670869
F-Measure	:	0.114667836000877
F-Measure	:	0.004642256136482331
F-Measure	:	0.13254834179539807
F-Measure	:	0.03248684511553421
F-Measure	:	0.016043397968605724
F-Measure	:	0.2066214185793482
F-Measure	:	0.19158976510067113
F-Measure	:	0.21924444673504098
F-Measure	:	0.03669933895600638
F-Measure	:	0.18115597783056211
F-Measure	:	0.1830387580636702
F-Measure	:	0.007420289855072463
F-Measure	:	0.22951154710811364
F-Measure	:	0.11181342632955536
F-Measure	:	0.0701813486047948
F-Measure	:	0.20964230171073095
F-Measure	:	0.18857053061652088
F-Measure	:	0.21968997022892928
F-Measure	:	0.25012647981382174
F-Measure	:	0.17085427135678394
F-Measure	:	0.15634139856421164
F-Measure	:	0.2163971572767535
F-Measure	:	0.2221075978748646
F-Measure	:	0.22456320657759507
F-Measure	:	0.23606590724165988
F-Measure	:	0.2014095536413469
F-Measure	:	0.10430664170062783
F-Measure	:	0.20771574652688118
F-Measure	:	0.2076253626191463
F-Measure	:	0.029182879377431907
F-Measure	:	0.21359323432343233
F-Measure	:	0.0025572474718121587
F-Measure	:	0.08629893238434164
F-Measure	:	0.11281268733990953
F-Measure	:	0.17873733108108109
F-Measure	:	0.17257546225570247
F-Measure	:	0.19978046103183317
F-Measure	:	0.20251193689018063
F-Measure	:	0.012711619575894147
F-Measure	:	0.02595620604882511
F-Measure	:	0.010996006713351467
F-Measure	:	0.026391279403327594
F-Measure	:	0.028059325430911074
F-Measure	:	0.10585969738651993
F-Measure	:	0.23020550402295906
F-Measure	:	0.20560844909213882
F-Measure	:	0.2730715567071956
F-Measure	:	0.21276153886091065
**Best F-Measure**	:	0.2730715567071956

Exception in thread "main" java.lang.NullPointerException
	at org.scify.jedai.blockprocessing.comparisoncleaning.WeightedEdgePruning.setNumberedRandomConfiguration(WeightedEdgePruning.java:150)
	at org.scify.jedai.workflowbuilder.Main.main(Main.java:170)

It found 3 different best F-Measure and ends with an exception.

GabrielePisciotta avatar Sep 27 '18 11:09 GabrielePisciotta