dodiscover icon indicating copy to clipboard operation
dodiscover copied to clipboard

Implement GES using causal-learn

Open robertness opened this issue 3 years ago • 2 comments

Is your feature request related to a problem? Please describe. We need an initial score-based algorithm.

Describe the solution you'd like Use causal-learn as a dependency to build GES

Describe alternatives you've considered We could build it on our own, but causal-learn has a solid implementation and there is no need to reinvent the wheel

robertness avatar Aug 25 '22 21:08 robertness

Here is a rendered doc that is a WIP of the PC algorithm API and its usage: https://output.circle-artifacts.com/output/job/4d057804-45d2-488b-89de-11fd8d37743c/artifacts/0/dev/index.html

This base class also defines the interface for any "constraint" causal discovery algorithm: https://github.com/py-why/dodiscover/blob/ab79e40853438ea6c5918554627b55a343c47ef2/dodiscover/constraint/_classes.py. Note this might change as Robert is updating the docs around Context vs data.

adam2392 avatar Sep 07 '22 19:09 adam2392

From the call, it seems the action items are:

  • [ ] implement a notebook/script that calls causal-learn and gets GES working on some example
  • [ ] write a function/class in dodiscover that wraps causal-learn's GES function and implements an api
  • [ ] add causal-learn to the pyproject.toml dev dependencies (I can help you do this)
  • [ ] write some unit/integration tests

There prolly will need to be some iterative discussion on the GES API. Basically, my general intuition is that all score-based algorithms should subclass a base class, or have a standard function signature.

Some thoughts on GES wrapping causal-learn

causal-learn docs contains the parametrization of GES in causal-learn. Note the differences relative to what dodiscover does.

  1. data is a numpy array rather than dataframe
  2. parameters imo should be "snake cased", like 'max_parents' instead of 'maxP'
  3. score function: should it be a Callable object, or str? If callable, then what is the expected signature of any score function? For example, see sklearn's metrics, which all have a standard function signature of y_true, y_pred, *, normalize=True, sample_weight=None.

By standardizing the interface that all score-based algorithms are called in dodiscover, we'll greatly simplify user life and future algorithms. Lmk if I missed anything?

adam2392 avatar Sep 07 '22 19:09 adam2392