dowhy p-value of 2.0 and an extra print in linear_regression

The placebo_treatment_refuter is returning a p-value of 2.0 I believe this is a special case scenario in which the estimate_index is either 0 or equal to the length of the simulations list (lines 186-199 in file dowhy/causal_refuters). I have scenarios which fall into the else: portion, and give a p-value of 1.0 at line 197 p_value = estimate_index / num_simulations and therefore give a final p-value of 2.0

While using the linear_regression estimator, there is unnecessary printing of arguments due to a print statement (line number 26 in file dowhy/causal_estimators/linear_regression_estimator.py)

25    args_dict.update(kwargs)
26    print(args_dict)
27    super().__init__(*args, **args_dict)

Just mentioning in case it hadn't been noticed.

Aug 10 '22 05:08 Jorawar-Singh

Hi, thanks for noticing. Regarding the p-value issue. Might it be possible that in your case your list of simulations is (almost) constant. By which I mean, it mostly consists of the same value. And perhaps your estimate is equal to that value too? That's the only scenario now I can image this happening.

Aug 18 '22 19:08 MichaelMarien

On the second topic, there are indeed a few locations where a print is used in the library. I wouldn't mind making one PR changing all of them to logging statements. @amit-sharma what do you think?

Aug 18 '22 19:08 MichaelMarien

That's a good idea @MichaelMarien . Feel free to start a PR on the print statements.

On the p-values, @Jorawar-Singh can you share a minimum working example to bring up the erroneous p-values you are seeing?

Aug 19 '22 04:08 amit-sharma

Here is the example for the p-value (using a subset [100 values] of the actual dataset):

import dowhy
import pandas as pd

A = [0.002587257,0.002612952,0.002636191,0.00265691,0.002675054,0.002690434,0.002702643,0.002711036,0.002714773,0.002712926,0.002704614,0.002689171,0.002666316,0.00263632,0.002600138,0.002559498,0.002516906,0.002475529,0.002438914,0.002410544,0.00239328,0.002388822,0.002397363,0.002417587,0.002446988,0.002482412,0.002520619,0.002558733,0.002594505,0.002626406,0.002653584,0.002675733,0.002692922,0.002705413,0.00271349,0.002717328,0.002716918,0.00271204,0.002702309,0.002687274,0.002666556,0.002640024,0.002607972,0.002571288,0.002531571,0.002491172,0.002453096,0.002420731,0.002397402,0.002385793,0.002387392,0.002402151,0.002428476,0.002463576,0.002504006,0.002546244,0.002587144,0.002624216,0.00265574,0.002680745,0.002698915,0.002710437,0.002715833,0.002715791,0.00271101,0.002702075,0.002689383,0.002673126,0.00265333,0.002629962,0.002603077,0.002572995,0.002540469,0.002506822,0.002473996,0.002444474,0.002421036,0.002406361,0.002402528,0.002410553,0.002430106,0.002459527,0.002496115,0.002536597,0.002577615,0.002616136,0.002649713,0.002676624,0.002695889,0.002707221,0.002710919,0.002707734,0.002698714,0.002685042,0.002667886,0.002648268,0.00262698,0.002604563,0.002581344]
B = [0.9833098,0.983301,0.983297,0.9832983,0.9833045,0.9833159,0.9833324,0.9833541,0.9833806,0.9834126,0.9834494,0.9834914,0.9835384,0.9835906,0.9836476,0.9837098,0.9837771,0.9838491,0.9839263,0.9840083,0.9840953,0.9841869,0.9842835,0.984385,0.9844911,0.9846018,0.9847172,0.9848373,0.984962,0.9850912,0.985225,0.9853632,0.9855058,0.9856527,0.9858041,0.9859597,0.9861195,0.9862837,0.9864517,0.9866241,0.9868004,0.9869809,0.9871652,0.9873533,0.9875454,0.9877412,0.9879406,0.9881439,0.9883506,0.9885609,0.9887745,0.9889913,0.9892117,0.9894352,0.9896619,0.9898914,0.990124,0.9903595,0.9905978,0.990839,0.9910827,0.9913291,0.9915781,0.9918293,0.9920831,0.9923392,0.9925975,0.9928579,0.9931204,0.993385,0.9936513,0.9939198,0.9941899,0.9944617,0.9947352,0.9950101,0.9952865,0.9955642,0.9958431,0.9961233,0.9964046,0.9966869,0.9969699,0.9972539,0.9975385,0.9978237,0.9981096,0.9983957,0.9986824,0.9989694,0.9992566,0.9995437,0.999831,1.0001183,1.0004053,1.0006922,1.0009789,1.0012652,1.0015508]
C = [5.6,5.6,5.6,5.6,5.9,5.9,5.9,5.9,5.9,5.6,5.6,5.6,5.6,5.2,5.6,5.6,5.9,6.2,6.2,6.6,6.6,6.6,6.6,6.2,6.2,5.9,5.9,5.6,5.2,5.2,5.2,5.2,5.6,5.6,5.9,5.9,5.9,5.9,5.6,5.6,5.6,5.6,5.6,5.6,5.6,5.9,6.2,6.2,6.6,6.6,6.6,6.6,6.6,6.2,5.9,5.9,5.6,5.2,5.2,5.2,5.2,5.2,5.6,5.6,5.9,5.9,5.9,5.6,5.6,5.6,5.6,5.6,5.6,5.6,5.9,6.2,6.2,6.2,6.6,6.6,6.6,6.6,6.2,5.9,5.6,5.6,5.2,4.9,4.9,5.2,5.2,5.6,5.6,5.6,5.6,5.9,5.9,5.9,5.9]
dataset = pd.DataFrame(list(zip(A,B,C)),columns=['A','B','C'])

causal_graph = """digraph {
A;
B;
C;
A->B->C;
A->C;
}"""
model= dowhy.CausalModel(data = dataset, graph=causal_graph.replace("\n", " "),treatment='B',outcome='C')

identified_estimand = model.identify_effect(proceed_when_unidentifiable=True)
estimate = model.estimate_effect(identified_estimand, method_name="backdoor.linear_regression", target_units="ate")

refute2_results=model.refute_estimate(identified_estimand, estimate, method_name="placebo_treatment_refuter")
print(refute2_results)

On my system, I get a p-value of 2.0 with these library and python versions: dowhy - 0.8+24.geba2c1cc.dirty pandas - 1.3.2 python - 3.9.6

Aug 24 '22 09:08 Jorawar-Singh

I'm closing this issue as the original question/comment seems to be addressed. Please re-open if not. Or feel free to open a new issue if this should actually be an enhancement.

Oct 14 '22 13:10 petergtz

p-value of 2.0 and an extra print in linear_regression_estimator