causal-learn icon indicating copy to clipboard operation
causal-learn copied to clipboard

ValueError: math domain error in PC with missing data

Open priamai opened this issue 2 years ago • 3 comments

Hi there, my input data is like this:

image

I then want to discover with missing values:

from causallearn.search.ConstraintBased.PC import pc
dataset= X.to_numpy()
sub_cols = X.columns
# default parameters
cg = pc(dataset,alpha=0.05,indep_test='mv_fisherz',mvpc=True)

Full error:


ValueError                                Traceback (most recent call last)
Cell In[206], line 5
      3 sub_cols = X.columns
      4 # default parameters
----> 5 cg = pc(dataset,alpha=0.05,indep_test='mv_fisherz',mvpc=True)

File /opt/conda/lib/python3.10/site-packages/causallearn/search/ConstraintBased/PC.py:41, in pc(data, alpha, indep_test, stable, uc_rule, uc_priority, mvpc, correction_name, background_knowledge, verbose, show_progress, node_names, **kwargs)
     39     if indep_test == fisherz:
     40         indep_test = mv_fisherz
---> 41     return mvpc_alg(data=data, node_names=node_names, alpha=alpha, indep_test=indep_test, correction_name=correction_name, stable=stable,
     42                     uc_rule=uc_rule, uc_priority=uc_priority, background_knowledge=background_knowledge,
     43                     verbose=verbose,
     44                     show_progress=show_progress, **kwargs)
     45 else:
     46     return pc_alg(data=data, node_names=node_names, alpha=alpha, indep_test=indep_test, stable=stable, uc_rule=uc_rule,
     47                   uc_priority=uc_priority, background_knowledge=background_knowledge, verbose=verbose,
     48                   show_progress=show_progress, **kwargs)

File /opt/conda/lib/python3.10/site-packages/causallearn/search/ConstraintBased/PC.py:200, in mvpc_alg(data, node_names, alpha, indep_test, correction_name, stable, uc_rule, uc_priority, background_knowledge, verbose, show_progress, **kwargs)
    198 indep_test = CIT(data, indep_test, **kwargs)
    199 ## Step 1: detect the direct causes of missingness indicators
--> 200 prt_m = get_parent_missingness_pairs(data, alpha, indep_test, stable)
    201 # print('Finish detecting the parents of missingness indicators.  ')
    202 
    203 ## Step 2:
    204 ## a) Run PC algorithm with the 1st step skeleton;
    205 cg_pre = SkeletonDiscovery.skeleton_discovery(data, alpha, indep_test, stable,
    206                                               background_knowledge=background_knowledge,
    207                                               verbose=verbose, show_progress=show_progress, node_names=node_names)

File /opt/conda/lib/python3.10/site-packages/causallearn/search/ConstraintBased/PC.py:275, in get_parent_missingness_pairs(data, alpha, indep_test, stable)
    272 ## Get the index of parents of missingness indicators
    273 # If the missingness indicator has no parent, then it will not be collected in prt_m
    274 for missingness_i in missingness_index:
--> 275     parent_of_missingness_i = detect_parent(missingness_i, data, alpha, indep_test, stable)
    276     if not isempty(parent_of_missingness_i):
    277         parent_missingness_pairs['prt'].append(parent_of_missingness_i)

File /opt/conda/lib/python3.10/site-packages/causallearn/search/ConstraintBased/PC.py:363, in detect_parent(r, data_, alpha, indep_test, stable)
    361 if len(Neigh_x) >= depth:
    362     for S in combinations(Neigh_x, depth):
--> 363         p = cg.ci_test(x, y, S)
    364         if p > alpha:
    365             if not stable:  # Unstable: Remove x---y right away

File /opt/conda/lib/python3.10/site-packages/causallearn/graph/GraphClass.py:58, in CausalGraph.ci_test(self, i, j, S)
     56 # assert i != j and not i in S and not j in S
     57 if self.test.method == 'mc_fisherz': return self.test(i, j, S, self.nx_skel, self.prt_m)
---> 58 return self.test(i, j, S)

File /opt/conda/lib/python3.10/site-packages/causallearn/utils/cit.py:388, in MV_FisherZ.__call__(self, X, Y, condition_set)
    386 if abs(r) >= 1: r = (1. - np.finfo(float).eps) * np.sign(r) # may happen when samplesize is very small or relation is deterministic
    387 Z = 0.5 * log((1 + r) / (1 - r))
--> 388 X = sqrt(len(test_wise_deletion_XYcond_rows_index) - len(condition_set) - 3) * abs(Z)
    389 p = 2 * (1 - norm.cdf(abs(X)))
    390 self.pvalue_cache[cache_key] = p

ValueError: math domain error

priamai avatar Oct 08 '23 07:10 priamai

Hi, it seems that #119 and #29 are related to this issue. Could you please try to add some random noises and see if it remains? I conjecture that it might be a violation of some assumptions in the data, such as singularity somewhere.

kunwuz avatar Oct 12 '23 23:10 kunwuz

Hi there, sounds like it but why is not generating the singularity Exception as it was discussed in the thread. Maybe it has not been implemented even though the issue was closed suggesting it will produce a meaningful error?

priamai avatar Oct 13 '23 00:10 priamai

We had updated the code but perhaps your case was not covered (#58). Would you mind providing us (perhaps via email: [email protected]) with a minimal reproducing example for your issue?

kunwuz avatar Oct 13 '23 01:10 kunwuz