causal-learn Error using FastKCI

When using the FastKCI method for an FCI search, I often obtain the following error:

Traceback (most recent call last):
 File "/mnt/users/hdesmond/Causality/run_cl_3.py", line 163, in <module>
   g, edges = fci(data, independence_test_method=indep_test_method, alpha=pval_threshold, depth=depth, max_path_length=max_path_length, verbose=verbose, background_knowledge=background_kn
owledge)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^
 File "/users/hdesmond/.local/lib/python3.11/site-packages/causallearn/search/ConstraintBased/FCI.py", line 1077, in fci
   graph, sep_sets, test_results = fas(dataset, nodes, independence_test_method=independence_test_method, alpha=alpha,
                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 File "/users/hdesmond/.local/lib/python3.11/site-packages/causallearn/utils/FAS.py", line 115, in fas
   p = cg.ci_test(x, y, S)
       ^^^^^^^^^^^^^^^^^^^
 File "/users/hdesmond/.local/lib/python3.11/site-packages/causallearn/graph/GraphClass.py", line 58, in ci_test
   return self.test(i, j, S)
          ^^^^^^^^^^^^^^^^^^
 File "/users/hdesmond/.local/lib/python3.11/site-packages/causallearn/utils/cit.py", line 480, in __call__
   self.kci_ci.compute_pvalue(self.data[:, Xs], self.data[:, Ys], self.data[:, condition_set])[0]
   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 File "/users/hdesmond/.local/lib/python3.11/site-packages/causallearn/utils/FastKCI/FastKCI.py", line 69, in compute_pvalue
   self.Z_proposal = Parallel(n_jobs=-1)(delayed(self.partition_data)() for i in range(self.J))
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 File "/users/hdesmond/.local/lib/python3.11/site-packages/joblib/parallel.py", line 2007, in __call__
   return output if self.return_generator else list(output)
                                               ^^^^^^^^^^^^
 File "/users/hdesmond/.local/lib/python3.11/site-packages/joblib/parallel.py", line 1650, in _get_outputs
   yield from self._retrieve()
 File "/users/hdesmond/.local/lib/python3.11/site-packages/joblib/parallel.py", line 1754, in _retrieve
   self._raise_error_fast()
 File "/users/hdesmond/.local/lib/python3.11/site-packages/joblib/parallel.py", line 1789, in _raise_error_fast
   error_job.get_result(self.timeout)
 File "/users/hdesmond/.local/lib/python3.11/site-packages/joblib/parallel.py", line 745, in get_result
   return self._return_or_raise()
          ^^^^^^^^^^^^^^^^^^^^^^^
 File "/users/hdesmond/.local/lib/python3.11/site-packages/joblib/parallel.py", line 763, in _return_or_raise
   raise self._result
ValueError: sum(pvals[:-1]) > 1.0

This is only for some datasets (others seem to work fine), and in cases where FastKCI fails like this, KCI works fine. Any idea what this means or what to do about it? I have a very large, nonlinear dataset so really need to use FastKCI...

Feb 26 '25 09:02 harrydesmond

FastKCI is an ongoing work by @OliverSchacht and @Biwei-Huang , so the current implementation might not be the final version. For very large nonlinear datasets, RCIT may also be worth trying

Feb 28 '25 17:02 kunwuz

Thanks for this. I'm trying RCIT now. On mock data resembling my real dataset I find good performance only if I use an extremely small p-value threshold, 1e-11 to 1e-14. Is this reasonable / expected at all?

Feb 28 '25 18:02 harrydesmond

aha I see, thanks for reporting. Perhaps @OliverSchacht has more intuition on this?

Feb 28 '25 18:02 kunwuz

Hi @harrydesmond ,

thanks for reporting this issue and sorry for the belated response. Concerning RCIT, I cannot provide so many insights.

Concerning FastKCI, as @kunwuz mentioned it is ongoing work, so I would gladly like to look into this issue in more detail. It looks like when partitioning the data there is an error that happens inside the parallelization so it needs a bit of debugging to find out what is going wrong. Do you have code reproducing this?

Thanks and best,

Oliver

Mar 05 '25 15:03 OliverSchacht

Strangely I cannot reproduce it reliably even when I fix the numpy random seed. I have a script that produces that error every time. However if I add one line that simply saves the data to file before running FCI, the error does not occur and the FCI runs fine. If I make another MWE script that loads the saved data with all settings the same, the error also does not occur. Very confused how this is possible.

Mar 05 '25 18:03 harrydesmond

I see. I'm not sure if in the current version a seed impacts the RNG inside the joblib parallelization instances. So what you observe might as well happen at a chance and thus only some runs fail. I did not encounter this error in my simulations yet, but I will look later if there is a way to reproduce it. Unfortunately this traceback does not tell alot about what's going wrong but again, joblib is tricky to debug.

Mar 05 '25 20:03 OliverSchacht

Ah it seems the FCI ran fine for much longer, but didn't actually complete. Here's some code and data that I hope will allow you to reproduce it. Sometimes it throws the error within 10 seconds of starting, sometimes it takes an hour. I've been running it on 28 cores in case that makes a difference.

from causallearn.search.ConstraintBased.FCI import fci
import numpy as np

np.random.seed(0)

indep_test_method = 'fastkci'

pval_threshold = 0.01
depth = -1
max_path_length = -1
verbose = True
background_knowledge = None

data = np.loadtxt("test_data.txt")

g, edges = fci(data, independence_test_method=indep_test_method, alpha=pval_threshold, depth=depth, max_path_length=max_path_length, verbose=verbose, background_knowledge=background_knowledge)

test_data.txt

Mar 06 '25 18:03 harrydesmond

Found it (I think) - due to numerical reasons, sometimes the weights for the multinomial when partitioning the data did not sum up to 1 and thus numpy threw this error (see this related issue).

I added a explicited normalization step that should prevent this, however, in very rare cases I could imagine this step having numerical issues too and then it might break (again). I ran your test data twice successfully without any errors occuring.

The fix is here. Opened a PR #228 too.

Would be very interested on your general feedback concerning FastKCI.

Best, Oliver

Mar 07 '25 19:03 OliverSchacht

Great, yes that seems to have fixed it!

I've been impressed with FastKCI. I haven't done exhaustive tests, but from what I've seen it performs roughly as well as KCI in a fraction of the time when the dataset is large. No problems with it besides this issue.

Mar 10 '25 09:03 harrydesmond