dodiscover icon indicating copy to clipboard operation
dodiscover copied to clipboard

[Example Draft] Add GES wrapper

Open adam2392 opened this issue 2 years ago • 8 comments

Signed-off-by: Adam Li [email protected]

Fixes #29

Changes proposed in this pull request:

Before submitting

  • [ ] I've read and followed all steps in the Making a pull request section of the CONTRIBUTING docs.
  • [ ] I've updated or added any relevant docstrings following the syntax described in the Writing docstrings section of the CONTRIBUTING docs.
  • [ ] If this PR fixes a bug, I've added a test that will fail without my fix.
  • [ ] If this PR adds a new feature, I've added tests that sufficiently cover my new functionality.

After submitting

  • [ ] All GitHub Actions jobs for my pull request have passed.

adam2392 avatar Jan 13 '23 22:01 adam2392

Codecov Report

Merging #97 (88c9304) into main (da36d83) will decrease coverage by 1.21%. The diff coverage is 0.00%.

@@            Coverage Diff             @@
##             main      #97      +/-   ##
==========================================
- Coverage   88.24%   87.02%   -1.22%     
==========================================
  Files          26       27       +1     
  Lines        1650     1673      +23     
  Branches      267      268       +1     
==========================================
  Hits         1456     1456              
- Misses        114      137      +23     
  Partials       80       80              
Impacted Files Coverage Δ
dodiscover/score/ges_alg.py 0.00% <0.00%> (ø)

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more

codecov-commenter avatar Jan 13 '23 22:01 codecov-commenter

@robertness no time to address this, but this is a prelim draft of GES and general score-based algo API.

However, what we need is a unit-test. I'm thinking a newcomer contributor can port the unit-test from causal-learn perhaps and rewrite it with a CPDAG graph from pywhy-graphs.

adam2392 avatar Jan 23 '23 16:01 adam2392

Doesn't look like causallearn.search.ScoreBased.GES.ges has the ability to evaluate under an oracle scoring method. Ideally I am trying to match the quality of testing that we had in fcialg.py.

jaron-lee avatar Feb 21 '23 19:02 jaron-lee

Doesn't look like causallearn.search.ScoreBased.GES.ges has the ability to evaluate under an oracle scoring method. Ideally I am trying to match the quality of testing that we had in fcialg.py.

In this case, we would need to replicate numerical tests. Idr, but does causal-learn have this? We can perhaps take those and just refashion the intput/output as the dodiscover api.

If not, okay maybe might be somewhat difficult to add this rn w/o assurances that the implementation is correct.

adam2392 avatar Feb 21 '23 19:02 adam2392

Doesn't look like causallearn.search.ScoreBased.GES.ges has the ability to evaluate under an oracle scoring method. Ideally I am trying to match the quality of testing that we had in fcialg.py.

In this case, we would need to replicate numerical tests. Idr, but does causal-learn have this? We can perhaps take those and just refashion the intput/output as the dodiscover api.

If not, okay maybe might be somewhat difficult to add this rn w/o assurances that the implementation is correct.

Hesitant to add numerical tests, causal-learn has these but in the test files there are comments where they note the test and implementation can be correct but the test fails because of essentially a random seed (and to open a PR if that happens). Sounds like a recipe for fragile tests, I much prefer the strategy of only testing with an oracle in the search algorithms, and leave the correctness of the constraint/scoring algorithm to that respective unit test.

jaron-lee avatar Feb 21 '23 19:02 jaron-lee

The heavyweight option would be to try and build a similar abstraction for scoring algorithms like you did for conditional independence tests.

A more lightweight approach would be to just write an oracle scoring function in the style of causal-learn's and include that somewhere for testing.

Bigger picture is that all of this (and indeed of efforts to integrate stuff from other libraries into dodiscover) depend on the conversation about the future of causal-learn/dodiscover.

jaron-lee avatar Feb 21 '23 19:02 jaron-lee

Is it possible to use an Oracle in a Score-based algorithm? I'm not familiar w/ the score based procedures as much. Is it just querying d-separation as well?

adam2392 avatar Feb 21 '23 19:02 adam2392

I'm speaking abstractly but I say oracle in the sense of a scoring algorithm that doesn't depend on a particular dataset but on the true model or at least of the true data generating distribution (maybe you would need an actual distribution so you can evaluate a true likelihood.

Let me think about this more however.

jaron-lee avatar Feb 21 '23 19:02 jaron-lee