CausalDiscoveryToolbox NCC Score Direction

In the original paper, "NCC tends to zero if the classifier believes in Xi → Yi and tends to one if the classifier believes in Xi ← Yi". https://arxiv.org/pdf/1605.08179.pdf

However in the NCC implementation, predict_proba returns a causation score (Value : 1 if a->b and -1 if b->a).

What exactly is this causation score? I read other threads re ANN. It still doesn't make sense to me.

Apr 29 '21 12:04 nudro

Hello, In this package, in order to be homogeneous between the various solutions, the output of predict_proba is rescaled between (1 and -1). abs(predict_proba) relates to the confidence of the model on how likely the two variables are causally related (In other words, the closer to 0, the more likely the variables are independant). For more details on the causation score, please refer to : Lopez-Paz, D., Muandet, K., Schölkopf, B., & Tolstikhin, I. (2015, June). Towards a learning theory of cause-effect inference. In International Conference on Machine Learning (pp. 1452-1461). PMLR.

Please note however that this implementation is not the original one, and might (and I think) that it has some errors, as many people were unable to reproduce the results from the paper.

Best, Diviyan

Apr 29 '21 12:04 diviyank

Thank for replying so quickly. And appreciate pointing me to the other references.

I'd be grateful for some feedback on the scores I'm getting. In experiment 1, predict_proba() is outputting scores like -0.999 for a hypothesis that A -> B. In experiment 2, predict_proba() is outputting scores like 54.790 for a hypothesis that A <- B. Both experiments have different A-B pairs.

Does -0.999 mean that A&B taking the abs value, the model is only 0.99% confident that A and B are causally related? Meaning that in Experiment 1, because it is closer to zero than in Experiment 2 (55.790%), A&B in Experiment 1 are independent?

Also, great repo and documentation. Thank you.

Apr 29 '21 13:04 nudro

There is an issue in the implementation right there. The score should be contained in [-1,1] and it should represent the percentage of confidence. The -0.999 indicates that the model has 99.9% confidence that there is a B -> A. And the score shouldn't go above either 1 or -1. On which data have you trained NCC?

Best, Diviyan

Apr 29 '21 13:04 diviyank

Ah thanks, ok.

I've been experimenting with an unconventional dataset that may be leading to these issues. They are pairs of images from GANs. Both images are known pairs, and each have been grey-scaled to (1, k) where k is a high-dimension.

Apr 29 '21 13:04 nudro

Hi, I appreciate your replies from a few months ago. Very helpful.

I have two new questions, and am hoping you can comment.

Question 1 - In predict_proba and predict_dataset, you subtract the output of self.model(m) by 0.5. Why is that? I couldn't find any mention of this in the paper.

Permalinks here:

https://github.com/FenTechSolutions/CausalDiscoveryToolbox/blob/46284195d63dbb5b8807d67aacfc7f351ced38a0/cdt/causality/pairwise/NCC.py#L262

https://github.com/FenTechSolutions/CausalDiscoveryToolbox/blob/46284195d63dbb5b8807d67aacfc7f351ced38a0/cdt/causality/pairwise/NCC.py#L290

Question 2 - The NCC paper mentions that the embedding layer is a Multilayer Perceptron, but your implementation uses conv1d layers. Is there a reason for this design choice?

Question 3 - The NCC paper uses tanh as the optimizer instead of Adam, and also trains on 100 hidden units, not 20 per your implementation.

Not a criticism, I'm just looking to understand your ideas for these implementation choices. I see that the NCC authors no longer have their original code available, and I was planning to reimplement your code using the specs laid out in the NCC paper.

Thanks for your time.

Jun 13 '21 17:06 nudro

NCC Score Direction - conflicts with paper