rdfrules
rdfrules copied to clipboard
Confidence counting of high support rules takes very long
Confidence counting of a high support rule (support 11.694.826) does not finish within five hours. The problem is possibly inefficient memory usage since the allocated memory (according to a server-side `top') after five hours is 98.6% of available memory (94 GB) and CPU-use is only around 1% (with unlimited parallelism).
What is also noteworthy is that the reported memory use by RDFRules does not exactly match server-side metering (client shows "Used memory: 74.81 GB / 90.00 GB".
This is not a bug, but possibly a sampling strategy could be used to compute approximate confidence. taskAndRules.zip
There is some other problem than just high support. Another rule in the same task ( ?b <interacts_with> ?a ) => ( ?a <interacts_with> ?b ) | HeadCoverage: 0.9917529917281246, HeadSize: 11702183, Support: 11605675 has almost identical support (11605675), but for this rule the confidence is computed in several seconds.
The problematic rules are ( ?b <provided_by> ?c ) ^ ( ?a <provided_by> ?c ) => ( ?a <interacts_with> ?b ) | HeadCoverage: 0.9993713138822047, HeadSize: 11702183, Support: 11694826 and ( ?a <category> ?c ) ^ ( ?b <category> ?c ) => ( ?a <interacts_with> ?b ) | HeadCoverage: 0.9918052042084797, HeadSize: 11702183, Support: 11606286.
This bug is possibly a duplicate of #74
It is the combinatorial explosion. One solution is to have an anytime approach with sampling and approximated results. Now, I added a better debugging of stucked rules and a possibility to interrupt mining or confidence computing tasks. Fortunately, during mining, the hardest rules are mined at the end of the refining rules queue.