rdfrules icon indicating copy to clipboard operation
rdfrules copied to clipboard

Confidence counting of high support rules takes very long

Open kliegr opened this issue 4 years ago • 3 comments
trafficstars

Confidence counting of a high support rule (support 11.694.826) does not finish within five hours. The problem is possibly inefficient memory usage since the allocated memory (according to a server-side `top') after five hours is 98.6% of available memory (94 GB) and CPU-use is only around 1% (with unlimited parallelism).

What is also noteworthy is that the reported memory use by RDFRules does not exactly match server-side metering (client shows "Used memory: 74.81 GB / 90.00 GB".

This is not a bug, but possibly a sampling strategy could be used to compute approximate confidence. taskAndRules.zip

kliegr avatar Oct 18 '21 12:10 kliegr

There is some other problem than just high support. Another rule in the same task ( ?b <interacts_with> ?a ) => ( ?a <interacts_with> ?b ) | HeadCoverage: 0.9917529917281246, HeadSize: 11702183, Support: 11605675 has almost identical support (11605675), but for this rule the confidence is computed in several seconds. The problematic rules are ( ?b <provided_by> ?c ) ^ ( ?a <provided_by> ?c ) => ( ?a <interacts_with> ?b ) | HeadCoverage: 0.9993713138822047, HeadSize: 11702183, Support: 11694826 and ( ?a <category> ?c ) ^ ( ?b <category> ?c ) => ( ?a <interacts_with> ?b ) | HeadCoverage: 0.9918052042084797, HeadSize: 11702183, Support: 11606286.

kliegr avatar Oct 19 '21 07:10 kliegr

This bug is possibly a duplicate of #74

kliegr avatar Oct 19 '21 11:10 kliegr

It is the combinatorial explosion. One solution is to have an anytime approach with sampling and approximated results. Now, I added a better debugging of stucked rules and a possibility to interrupt mining or confidence computing tasks. Fortunately, during mining, the hardest rules are mined at the end of the refining rules queue.

propi avatar Sep 14 '22 12:09 propi