dodiscover
dodiscover copied to clipboard
Implement GES using causal-learn
Is your feature request related to a problem? Please describe. We need an initial score-based algorithm.
Describe the solution you'd like Use causal-learn as a dependency to build GES
Describe alternatives you've considered We could build it on our own, but causal-learn has a solid implementation and there is no need to reinvent the wheel
Here is a rendered doc that is a WIP of the PC algorithm API and its usage: https://output.circle-artifacts.com/output/job/4d057804-45d2-488b-89de-11fd8d37743c/artifacts/0/dev/index.html
This base class also defines the interface for any "constraint" causal discovery algorithm: https://github.com/py-why/dodiscover/blob/ab79e40853438ea6c5918554627b55a343c47ef2/dodiscover/constraint/_classes.py. Note this might change as Robert is updating the docs around Context vs data.
From the call, it seems the action items are:
- [ ] implement a notebook/script that calls causal-learn and gets GES working on some example
- [ ] write a function/class in dodiscover that wraps causal-learn's GES function and implements an api
- [ ] add causal-learn to the
pyproject.tomldev dependencies (I can help you do this) - [ ] write some unit/integration tests
There prolly will need to be some iterative discussion on the GES API. Basically, my general intuition is that all score-based algorithms should subclass a base class, or have a standard function signature.
Some thoughts on GES wrapping causal-learn
causal-learn docs contains the parametrization of GES in causal-learn. Note the differences relative to what dodiscover does.
- data is a numpy array rather than dataframe
- parameters imo should be "snake cased", like 'max_parents' instead of 'maxP'
- score function: should it be a
Callableobject, or str? If callable, then what is the expected signature of any score function? For example, see sklearn's metrics, which all have a standard function signature ofy_true, y_pred, *, normalize=True, sample_weight=None.
By standardizing the interface that all score-based algorithms are called in dodiscover, we'll greatly simplify user life and future algorithms. Lmk if I missed anything?