CausalInference.jl icon indicating copy to clipboard operation
CausalInference.jl copied to clipboard

Give the option to use the stable version of the PC

Open mwien opened this issue 2 years ago • 5 comments

In the README.md and Docs it says that pcalg implements the stable PC algorithm.

I might be missing something but to me the implementation looks like the standard PC.

The difference in the skeleton phase between standard PC and stable PC is in the following line (screenshots from https://jmlr.org/papers/volume15/colombo14a/colombo14a.pdf).

Standard PC: image

Stable PC: image

Difference is summarized here: image

In skeleton we have: https://github.com/mschauer/CausalInference.jl/blob/2aadc813bd3e0c10186349f85fe5b01d598bebe3/src/skeleton.jl#L40

Which always extracts the latest neighbor set at each step, not only after each update of d (or l in Colombo and Maathuis notation).

It is then used here for forming set S: https://github.com/mschauer/CausalInference.jl/blob/2aadc813bd3e0c10186349f85fe5b01d598bebe3/src/skeleton.jl#L45

So as far as I understand, one would need to either copy the graph at the start of the while true loop and use that for extracting the neighbors or not remove edges directly but store the to-be-deleted edges temporarily and remove them at the d = d+1 line.

mwien avatar Jul 26 '23 16:07 mwien

You are reading the code! I don't remember anymore but I think you are right. Let's remove the reference to stable and keep the issue open if someone wants to implement it.

mschauer avatar Jul 26 '23 17:07 mschauer

Sounds good!

mwien avatar Jul 26 '23 17:07 mwien

store the to-be-deleted edges temporarily and remove them at the d = d+1 line

This might be a very good idea because removing single edges is linear in the degree and removing n edges is the complexity of setdiff

mschauer avatar Jul 27 '23 07:07 mschauer

Yep, that's probably the way to go.

We have to be a bit careful. Once you start with variants of PC, it doesn't stop 😅

E.g., this change would only give a stable version of the skeleton phase. Colombo and Maathuis also give stable versions of the other phases.

Another thing is how to handle the following case: A - B - C - D and B is not in the sepset for (A, C) and C is not in the sepset for (B, D).

Obviously, this can only happen in case of an error during CI testing.

I'm not sure what we are actually doing then. Could it happen that B - C gets fully removed? E.g. first we have A -> B <- C - D and then, the triple D - C -> B is found (let's say in this order D = u, C = v, B = w because then https://github.com/mschauer/CausalInference.jl/blob/706eef3ed66d52dcf31c2c533697259834166ed9/src/pc.jl#L168 evaluates to true and C -> B is removed?).

What people also do is just keep the first orientation of an edge (e.g. B <- C here; then you could ask whether C <- D should be oriented) or always override orientations or to mark such edges B - C as a "conflict" edge.

I don't know what's best, but full removal would probably not be good (maybe we can check whether this actually occurs, should be a simple test whether number of adjacencies goes down after orientation phase).

EDIT: As a reference point, this problem is discussed in pcalg under the option "solve.confl" of the pc function

mwien avatar Jul 27 '23 10:07 mwien

I gave a shot at the stable skeleton algorithm in #101

mschauer avatar Jul 27 '23 15:07 mschauer