Error using FastKCI
When using the FastKCI method for an FCI search, I often obtain the following error:
Traceback (most recent call last):
File "/mnt/users/hdesmond/Causality/run_cl_3.py", line 163, in <module>
g, edges = fci(data, independence_test_method=indep_test_method, alpha=pval_threshold, depth=depth, max_path_length=max_path_length, verbose=verbose, background_knowledge=background_kn
owledge)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^
File "/users/hdesmond/.local/lib/python3.11/site-packages/causallearn/search/ConstraintBased/FCI.py", line 1077, in fci
graph, sep_sets, test_results = fas(dataset, nodes, independence_test_method=independence_test_method, alpha=alpha,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/users/hdesmond/.local/lib/python3.11/site-packages/causallearn/utils/FAS.py", line 115, in fas
p = cg.ci_test(x, y, S)
^^^^^^^^^^^^^^^^^^^
File "/users/hdesmond/.local/lib/python3.11/site-packages/causallearn/graph/GraphClass.py", line 58, in ci_test
return self.test(i, j, S)
^^^^^^^^^^^^^^^^^^
File "/users/hdesmond/.local/lib/python3.11/site-packages/causallearn/utils/cit.py", line 480, in __call__
self.kci_ci.compute_pvalue(self.data[:, Xs], self.data[:, Ys], self.data[:, condition_set])[0]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/users/hdesmond/.local/lib/python3.11/site-packages/causallearn/utils/FastKCI/FastKCI.py", line 69, in compute_pvalue
self.Z_proposal = Parallel(n_jobs=-1)(delayed(self.partition_data)() for i in range(self.J))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/users/hdesmond/.local/lib/python3.11/site-packages/joblib/parallel.py", line 2007, in __call__
return output if self.return_generator else list(output)
^^^^^^^^^^^^
File "/users/hdesmond/.local/lib/python3.11/site-packages/joblib/parallel.py", line 1650, in _get_outputs
yield from self._retrieve()
File "/users/hdesmond/.local/lib/python3.11/site-packages/joblib/parallel.py", line 1754, in _retrieve
self._raise_error_fast()
File "/users/hdesmond/.local/lib/python3.11/site-packages/joblib/parallel.py", line 1789, in _raise_error_fast
error_job.get_result(self.timeout)
File "/users/hdesmond/.local/lib/python3.11/site-packages/joblib/parallel.py", line 745, in get_result
return self._return_or_raise()
^^^^^^^^^^^^^^^^^^^^^^^
File "/users/hdesmond/.local/lib/python3.11/site-packages/joblib/parallel.py", line 763, in _return_or_raise
raise self._result
ValueError: sum(pvals[:-1]) > 1.0
This is only for some datasets (others seem to work fine), and in cases where FastKCI fails like this, KCI works fine. Any idea what this means or what to do about it? I have a very large, nonlinear dataset so really need to use FastKCI...
FastKCI is an ongoing work by @OliverSchacht and @Biwei-Huang , so the current implementation might not be the final version. For very large nonlinear datasets, RCIT may also be worth trying
Thanks for this. I'm trying RCIT now. On mock data resembling my real dataset I find good performance only if I use an extremely small p-value threshold, 1e-11 to 1e-14. Is this reasonable / expected at all?
aha I see, thanks for reporting. Perhaps @OliverSchacht has more intuition on this?
Hi @harrydesmond ,
thanks for reporting this issue and sorry for the belated response. Concerning RCIT, I cannot provide so many insights.
Concerning FastKCI, as @kunwuz mentioned it is ongoing work, so I would gladly like to look into this issue in more detail. It looks like when partitioning the data there is an error that happens inside the parallelization so it needs a bit of debugging to find out what is going wrong. Do you have code reproducing this?
Thanks and best,
Oliver
Strangely I cannot reproduce it reliably even when I fix the numpy random seed. I have a script that produces that error every time. However if I add one line that simply saves the data to file before running FCI, the error does not occur and the FCI runs fine. If I make another MWE script that loads the saved data with all settings the same, the error also does not occur. Very confused how this is possible.
I see. I'm not sure if in the current version a seed impacts the RNG inside the joblib parallelization instances. So what you observe might as well happen at a chance and thus only some runs fail. I did not encounter this error in my simulations yet, but I will look later if there is a way to reproduce it. Unfortunately this traceback does not tell alot about what's going wrong but again, joblib is tricky to debug.
Ah it seems the FCI ran fine for much longer, but didn't actually complete. Here's some code and data that I hope will allow you to reproduce it. Sometimes it throws the error within 10 seconds of starting, sometimes it takes an hour. I've been running it on 28 cores in case that makes a difference.
from causallearn.search.ConstraintBased.FCI import fci
import numpy as np
np.random.seed(0)
indep_test_method = 'fastkci'
pval_threshold = 0.01
depth = -1
max_path_length = -1
verbose = True
background_knowledge = None
data = np.loadtxt("test_data.txt")
g, edges = fci(data, independence_test_method=indep_test_method, alpha=pval_threshold, depth=depth, max_path_length=max_path_length, verbose=verbose, background_knowledge=background_knowledge)
Found it (I think) - due to numerical reasons, sometimes the weights for the multinomial when partitioning the data did not sum up to 1 and thus numpy threw this error (see this related issue).
I added a explicited normalization step that should prevent this, however, in very rare cases I could imagine this step having numerical issues too and then it might break (again). I ran your test data twice successfully without any errors occuring.
The fix is here. Opened a PR #228 too.
Would be very interested on your general feedback concerning FastKCI.
Best, Oliver
Great, yes that seems to have fixed it!
I've been impressed with FastKCI. I haven't done exhaustive tests, but from what I've seen it performs roughly as well as KCI in a fraction of the time when the dataset is large. No problems with it besides this issue.