EconML
EconML copied to clipboard
GRF Causal Forest alpha and pointJ question
Hi,
I was using the GRF code (https://github.com/py-why/EconML/blob/8b7fe338600b7ccb6b8362f658d0ec35f5c75b7a/econml/grf/classes.py#L394-L399) and I noticed that the alpha and pointJ for CATE estimation is defined as:
- y * T
- T x T (cross-product)
I presumed pointJ was the Jacobian being estimated. This brought up some confusion for me and now I was hoping to clarify with the dev team here:
- what is alpha in the GRF code and why is it y * T? What if T is a vector and y is just a univariate outcome? It's not documented anywhere what the behavior here is.
- why is the point-wise Jacobian of the moment equation just the cross-product of the treatment arrays?
Its documented in the parent abstract class. Every child of base grf needs to be implementing this abstract class: https://github.com/py-why/EconML/blob/8b7fe338600b7ccb6b8362f658d0ec35f5c75b7a/econml/grf/_base_grf.py#L106
Note also that grf always transforms inputs to 2d matrices. Before anything. So y here is (n, 1) and T is (n, nt). This multiplication multiplies every treatment with y.
The (i, j) entry of the jacobian is E[ Ti * Tj ]. So a sample of this is Ti*Tj. Each row of the pointJ is a “point” sample of the jacobian, flattened in a 1-d array. That’s why its just the cross product of the treatments
Note also that grf always transforms inputs to 2d matrices. Before anything. So y here is (n, 1) and T is (n, nt). This multiplication multiplies every treatment with y.
So this transforms the weights (alpha) to be the outcome multiplied with the treatments? I can see why we might do this if T is {0, 1}, but why would you do this in general and initialize the weights as this array? Apologies if I misunderstood something.
Thanks for the quick response!
These alpha is not the weights its rhe offset part of the linear moment condition that we are solving https://github.com/py-why/EconML/blob/8b7fe338600b7ccb6b8362f658d0ec35f5c75b7a/econml/grf/_base_grf.py#L45
As noted rhere our GRF implementation covers only linear moment restrictions which covers almost all moments that people use frequently in practice (with the only potential exception being quantile forests)
As noted rhere our GRF implementation covers only linear moment restrictions which covers almost all moments that people use frequently in practice (with the only potential exception being quantile forests)
Restricting to linear moments makes computation way faster because we can simply store the local average J and the local average A (alpha) at the leaf nodes and these are sufficient statistics for all downstream calculations. We dont need to store for instance the whole “training data samples”