tfcausalimpact
tfcausalimpact copied to clipboard
comparison of output (impact$series$cum.effect) in Python and R packages
thanks for the great effort in keeping this library updated.
I'm working on converting an R library to Python, and the R library has the following line of code:
preperiod <- subset(impact$series, cum.effect == 0)
where impact
is the output object of the CausalImpact library.
From what I can tell:
impact$series$cum.effect
in R is computed in impact.inferences.post_cum_effects_means
in python.
I used the comparison example that you have provided in the README (with comparison_data.csv
), but I'm getting different output. From the R library, the values of impact$series$cum.effect
start with zero in the earlier dates, whereas it is NaN
in the Python package, the values for the later dates differ as well.
I'd greatly appreciate some feedback on comparing the output so I can covert the following line of code to Python appropriately:
preperiod <- subset(impact$series, cum.effect == 0)
I tried both methods: hmc
and vi
, and the output of the other columns in impact$series
is different from impact.inferences
in python as well.
thank you and looking forward to hearing back from you
Hi @rj678 ,
The preperiod
as given by your assignment would be computed in Python by something like:
preperiod = ci.inferences['post_cum_effects_means'][ci.inferences['post_cum_effects_means'].isna()]
Which essentially retrieves completed predictions of training data. In R package the empty values were assigned as "zeroes" whereas in Python, as they don't exist, remained as NaN
.
Notice also that if you want to work with pre_period
data it's also available in the ci
object in ci.pre_data
or ci.normed_pre_data
(the latter is same data but with normalization applied).
As for varying results, did the results you observed differ too much from the official README report? I just ran it here and had very close results — using hmc
method. They will never be the same as the algorithm behind is not deterministic but they should always converge to the same conclusions and be very close for the most part.
Results are expected to change from the original R package as well but again they should lead to same conclusions and be similar overall. The cumulative field will differ more as it sums up all estimated points in post period.
Let me know if this helps you,
Best,
Will
thanks so much for confirming that the empty values are zero in in R, and NaN in Python - from what I remember, the difference between the non-zero values in impact$series$cum.effect
and ci.inferences['post_cum_effects_means']
was not insignificant - I'll check again and get back, thanks so much for the detailed response.