dowhy
dowhy copied to clipboard
graph learning and R depedency on CDT
In the documentation the causal learning depends on an external library CDT:
- https://github.com/py-why/dowhy/blob/main/docs/source/example_notebooks/dowhy_causal_discovery_example.ipynb
- https://www.pywhy.org/dowhy/v0.10/example_notebooks/dowhy_causal_discovery_example.html
The CDT library is quite cumbersome to install since it relies on the R environment and a bunch of R libraries. In the documentation I can see this:
https://www.pywhy.org/dowhy/v0.10/dowhy.graph_learners.html
which seems to be a wrapper still on CDT.
Are there are any plans to implement a full python module and avoid using R?
I can see that for example the GES sampler is both available via the python pure package and in the CDT package.
Hi @priamai , thanks for the question. Causal-Learn, another of the pywhy libraries, provides a broad set of causal discovery methods. Does that provide what you are looking for?
https://github.com/py-why/causal-learn
It would be great to update that sample notebook to use causal-learn instead of CDT. It is an old example.
ah yes, we should update that example and use causal-learn.
@kunwuz would you like to update this notebook and add causal-learn algorithms?
That will be awesome!
Aha yes, I will update this notebook soon. At the same time, please feel free to let me know if you have any questions using causal-learn.
I have one question @kunwuz, I notice there are also conditional independence test which we also have in DoWhy correct? This link: https://www.cmu.edu/dietrich/causality/ seems to timeout for me, can you access it (I managed it took ages to load)? They have some really good datasets here: https://github.com/cmu-phil/example-causal-datasets Would be great to show how to run the discovery on each one?
Can you access this link? https://www.cmu.edu/dietrich/causality/projects/causal_learn_benchmarks/ It works from my side. And yes, cmu-phil has a lot of well-maintained/processed datasets, and I will try to write up a tutorial on applying discovery methods on them. Of course, I will let you know when I finish it.
Causal-learn has a series of conditional independence tests, including fisher-z, chi-square, kernel-based tests, and others. The kernel-based tests have also been used in dowhy-gcm. In a recent package called pywhy-stat, a more complete collection of tests has been integrated, although still in process.
Hello yes, I am building a graph itself to document all the various frameworks with an initial taxonomy, it would be nice if you could contribute to it here I can make you an author.
With causa-learn one of the main issues to use with pandas data frames is that it requires numpy and thus requires to keep some form of mapping between the column names.
Here's a practical example:
It would be nice if you could facilitate this kind of translation.