dowhy icon indicating copy to clipboard operation
dowhy copied to clipboard

Couple of questions on causal_identifier.py

Open Lovkush-A opened this issue 3 years ago • 1 comments

I am trying to understand code base (and also learning causal reasoning) and I had couple of questions from causal_identifier.py

  1. At around line 319, we have:
        is_identified = [ self._graph.all_observed(bset["backdoor_set"]) for bset in backdoor_sets ]
        if all(is_identified):
            self.logger.info("All common causes are observed. Causal effect can be identified.")

However, when calculating the backdoor sets, we might not calculate all possible back door sets (e.g. if we hit the 100000 limit when doing exhaustive method). Is this an issue for the id_identified variable? Is the claim 'All common causes are observed' legitimate in such circumstances?

  1. I thinkidentify_nie_effect and identify_nde_effect are identical. Is this intentional? If so, what is the reason for it?

Lovkush-A avatar Apr 27 '21 17:04 Lovkush-A

  1. This is a good point. Ideally we should remove the 100000 limit and let the user specify that value. By default, it can be None and we let the algorithm run its full exhaustive search. Btw, if the current code outputs "all common causes are observed", then it is always correct. The issue is that the 100000 limit may fail to check a valid backdoor and then we may incorrectly conclude that "causal effect cannot be identified", when in fact it can be.

To your point, perhaps the message needs to change too. We may say, "The identify algorithm found a config where all common causes are observed", or "the identify algorithm failed to find a config where all common causes are observed"

  1. DoWhy currently has limited support for nde and nie (only simple mediators in the graph and only linear model for estimation). In these settings, estimation for nde and nie require the same set of three variables: backdoor variables, mediators, and the first-stage and second-stage mediators' confounders. So currently the two methods return the same information: whether mediators of this type can be identified in the graph. The plan is to implement a more complete identification for mediation, when these two functions may become different.

amit-sharma avatar Apr 29 '21 05:04 amit-sharma