dodiscover
dodiscover copied to clipboard
Define an API for hard interventions
Is your feature request related to a problem? Please describe. No causal discovery library worth it's salt has only observational causal discovery alone.
Describe the solution you'd like Hard interventions: For data in a data frame. I imaged a list of lists, where the jth element of the outer list is a list of column indexes that were intervened on in the jth row. Although this already feels like there is a missing abstraction that needs definition
Soft interventions: A list, where the jth element of the list corresponds to value of the soft intervention had in the jth row of the data.
Describe alternatives you've considered Perhaps an intervention column in the data itself.
I am wondering, shouldn't this be part of DoWhy rather then dodiscover? For instance, seeing that DoWhy already offers capabilities for conditional and soft interventions, one could use that one to perform interventions if this is required in causal discovery.
This GH issue is for the API for specifying hard interventions in a learn_graph(...) function, rather than how to do the interventions themselves (if that is what you are asking?).
For example, say data is a simple 2D pandas dataframe with variables as columns and samples as rows, I'm thinking out loud here... do we do:
discoveralgo.fit(data, intervention_list = [('x', 1.0), ('y', 2.0), ('z', 0.0)])with a list of tuples?discoveralgo.fit(augmented_data)where the dataframe itself is augmented with where the interventions took place?discoveralgo.fit(data, intervention_list=[('x', lambda x: 1.0),...])where we need to specify lambda functions?discoveralgo.fit(data, context)where the context helps us specify the intervention structure in the dataset?- etc.
I am wondering, shouldn't this be part of DoWhy rather then dodiscover? For instance, seeing that DoWhy already offers capabilities for conditional and soft interventions, one could use that one to perform interventions if this is required in causal discovery.
@adam2392 is right @bloebp , this is not about doing interventions with an intervention operator on a causal model, it is about utilizing information in the data where rows are labeled as having been subject to an intervention.