pymc-examples
pymc-examples copied to clipboard
Timeseries models derived from generative graph
As discussed with @ricardoV94 I will port the gist https://gist.github.com/ricardoV94/a49b2cc1cf0f32a5f6dc31d6856ccb63#file-pymc_timeseries_ma-ipynb into the PyMC Example Gallery. I will add text and explanation to the existing working code :)
π Documentation preview π: https://pymc-examples--642.org.readthedocs.build/en/642/
Check out this pull request onΒ ![]()
See visual diffs & provide feedback on Jupyter Notebooks.
Powered by ReviewNB
I think this one is ready for a first review round.
I have not been able to make the scan reference work with {func}~pytensor.scan.basic.scan nor {func}~pytensor.scan ... any tips? Thanks
View / edit / reply to this conversation on ReviewNB
AlexAndorra commented on 2024-03-11T18:53:02Z ----------------------------------------------------------------
- In
Thisthis notebook, we show how to model - I would motivate more why you would do that, instead of just using
pm.AR-- what are the benefits? In which cases is it worth the trouble dealing withpytensorandscandirectly?
ricardoV94 commented on 2024-03-13T09:58:34Z ----------------------------------------------------------------
Agree with Alex. The main motivation is this framework allows you to define many arbitrary timeseries, not just things that are pre-packaged in PyMC. For the AR example, one could for instance add different Noise (StudentT) or covariates that change over time...
ricardoV94 commented on 2024-03-13T10:07:00Z ----------------------------------------------------------------
Maybe add a second more complex example, either MA2 https://gist.github.com/ricardoV94/a49b2cc1cf0f32a5f6dc31d6856ccb63#file-pymc_timeseries_ma-ipynb or one of those Jesse wrote here https://gist.github.com/jessegrabowski/ccda08b8a758f882f5794b8b89ace07a ?
jessegrabowski commented on 2024-03-13T10:28:06Z ----------------------------------------------------------------
I actually disagree, I think an AR(2) is a fine choice. I was going to put suggestions for other models here (ARIMA-GARCH or ETS), but I actually think it's better to keep this notebook really simple and focus on the machinery, which is quite complex.
ricardoV94 commented on 2024-03-13T10:51:14Z ----------------------------------------------------------------
Showing a non-recursive time varying parameter could be useful though? Can split into two separate notebooks?
jessegrabowski commented on 2024-03-13T10:54:36Z ----------------------------------------------------------------
I think that's a good 2nd example, because it also serves as a tutorial on the difference between outputs_info,sequences,and non_sequences
Even if it's not a time-varying parameter, maybe an example that shows how to combine an exogenous regression with an AR model, so you're just scanning in some covariate data and doing a linear model with AR distributed errors.
juanitorduz commented on 2024-05-06T12:17:35Z ----------------------------------------------------------------
Maybe add a second more complex example, either MA2?
I suggest we keep this notebook simple and work out other more complex examples in a different notebook (I can also work on it). In my experience, the first time an user sees these models can be overwhelming, so let's keep it simple for this one :D
View / edit / reply to this conversation on ReviewNB
AlexAndorra commented on 2024-03-11T18:53:03Z ----------------------------------------------------------------
- I think we need to explicit what
collect_default_updatesdoes, as well as thescanAPI, which is always tricky. - Also explain what
ar_init's type should be. tapsis like a countdown?
ricardoV94 commented on 2024-03-13T10:01:31Z ----------------------------------------------------------------
Re: collect_default_updates, it tells PyMC that the RV in the generative graph should be updated in every iteration of the loop. Agree with adding more context on how Scan is defined and linking to the PyTensor docs for a deeper dive: https://pytensor.readthedocs.io/en/latest/library/scan.html
juanitorduz commented on 2024-05-06T13:34:35Z ----------------------------------------------------------------
Addd more info!
View / edit / reply to this conversation on ReviewNB
AlexAndorra commented on 2024-03-11T18:53:04Z ----------------------------------------------------------------
Explain why you set all the observed data nodes to 0
jessegrabowski commented on 2024-03-13T10:51:13Z ----------------------------------------------------------------
Why is there an observed value for the initial condition? We never observe this by definition.
ricardoV94 commented on 2024-03-13T12:50:51Z ----------------------------------------------------------------
I don't see why you can't observe it?
jessegrabowski commented on 2024-03-13T13:41:37Z ----------------------------------------------------------------
Because the first observation in the data is $x_0$, so the initial conditions $x_{-1}$ and $x_{-2}$ are by definition unobserved
The way this model is written, it assumes that the first observations of the data are generated by some arbitrary normal distribution, which then go on to spontaneously kick-off an autoregressive process that describes the rest of the data. This isn't logical. The correct definition of the model should consider all observed data as part of the autoregressive process
juanitorduz commented on 2024-05-06T13:35:35Z ----------------------------------------------------------------
@jeseegrabowski I used the code you shareed on discourse and that is why I added you as a co-author :)
View / edit / reply to this conversation on ReviewNB
AlexAndorra commented on 2024-03-11T18:53:05Z ----------------------------------------------------------------
Can't we get rid of the observed in the first model, then use pm.observe here? That makes for a clearer API and experience for the readers
ricardoV94 commented on 2024-03-13T10:03:25Z ----------------------------------------------------------------
Agree. Let me know if something breaks
jessegrabowski commented on 2024-03-13T13:46:49Z ----------------------------------------------------------------
I blatantly plagiarized this notebook and used pm.observe in a discourse thread here, maybe it could be helpful.
juanitorduz commented on 2024-05-06T13:35:51Z ----------------------------------------------------------------
Fixed in 67ec83d
View / edit / reply to this conversation on ReviewNB
AlexAndorra commented on 2024-03-11T18:53:06Z ----------------------------------------------------------------
"we see the model is capturing the global dynamics of the time series. In order to have a abetter a better insight of the model"
jessegrabowski commented on 2024-03-13T10:24:47Z ----------------------------------------------------------------
I think a discussion of conditional and unconditional posteriors is needed here. Many users will be surprised by this posterior because they are used to seeing conditional one-step forecasts, $p(x_t | \{x_\tau\}_{\tau=0}^{t-1})$ (what you get in statsmodels/stata/everything), which are much tighter and follow the data more closely.
At the risk of scope-creep, I think it's also important to show users how to use a predictive model to get the conditional posterior. It would also be the first place in pymc-examples that shows how to use a predictive model -- up until now we only have the labs blog.
Agree with Alex. The main motivation is you can this framework allows you to define many arbitrary time-series, not just things that are pre-packagend in PyMC. For the AR example, one could for instance add different Noise (StudentT) or covariates that change over time...
View entire conversation on ReviewNB
Re: collect_default_updates, it tells PyMC that the RV in the generative graph should be updated in every iteration of the loop. Agree with adding more context on how Scan is defined and linking to the PyTensor docs for a deeper dive: https://pytensor.readthedocs.io/en/latest/library/scan.html
View entire conversation on ReviewNB
May be a good excuse to use pm.observe instead of the dummy MutableData
View entire conversation on ReviewNB
Maybe add a second more complex example, either MA2 https://gist.github.com/ricardoV94/a49b2cc1cf0f32a5f6dc31d6856ccb63#file-pymc_timeseries_ma-ipynb or one of those Jesse wrote here https://gist.github.com/jessegrabowski/ccda08b8a758f882f5794b8b89ace07a ?
View entire conversation on ReviewNB
I think a discussion of conditional and unconditional posteriors is needed here. Many users will be surprised by this posterior because they are used to seeing conditional one-step forecasts, $p(x_t | \{x_\tau\}_{\tau=0}^{t-1})$ (what you get in statsmodels/stata/everything), which are much tighter and follow the data more closely.
At the risk of scope-creep, I think it's also important to show users how to use a predictive model to get the conditional posterior. It would also be the first place in pymc-examples that shows how to use a predictive model -- up until now we only have the labs blog.
View entire conversation on ReviewNB
I actually disagree, I think an AR(2) is a fine choice. I was going to put suggestions for other models here (ARIMA-GARCH or ETS), but I actually think it's better to keep this notebook really simple and focus on the machinery, which is quite complex.
View entire conversation on ReviewNB
Why is there an observed value for the initial condition? We never observe this by definition.
View entire conversation on ReviewNB
Showing a non-recursive time varying parameter could be useful though? Can split into two separate notebooks?
View entire conversation on ReviewNB
I think that's a good 2nd example, because it also serves as a tutorial on the difference between outputs_info,sequences,and non_sequences
View entire conversation on ReviewNB
Because the first observation in the data is $x_0$, so the initial conditions $x_{-1}$ and $x_{-2}$ are by definition unobserved
View entire conversation on ReviewNB
I blatently plagerized this notebook and used pm.observe in a discourse thread here, maybe it could be helpful.
View entire conversation on ReviewNB
Thank you all for the feedback! Now that the HSGP example was merged I can come back to this one (there are no dependencies but I was busy with other stuff π« ). Apologies for the delay ππ.
Thank you all for the feedback! Now that the HSGP example was merged I can come back to this one (there are no dependencies but I was busy with other stuff π« ). Apologies for the delay ππ.
Awesome!
Maybe add a second more complex example, either MA2?
I suggest we keep this notebook simple and work out other more complex examples in a different notebook (I can also work on it). In my experience, the first time an user sees these models can be overwhelming, so let's keep it simple for this one :D
View entire conversation on ReviewNB
@jeseegrabowski I used the code you shared on discourse and that is why I added you as a co-author :)
View entire conversation on ReviewNB
@jessegrabowski I think your point about conditional vs unconditional posterior is indeed very important! In order to generate the conditional posterior, do I need to condition on the whole observed time series somehow recursively to get the one-step prediction ? Do you have a snippet or code to do this ? Or maybe I am over-thinking it ? π
You'd take the posterior parameters and scan over the observations, setting each observation as $y_t$ then predicting $y_{t+1} | y_t, \theta$. Here's a quick gist I whipped up that I think does the right things:
https://gist.github.com/jessegrabowski/d3ea37e521dc3e63d89864d0bbcf54e0
You'd take the posterior parameters and scan over the observations, setting each observation as yt then predicting yt+1|yt,ΞΈ. Here's a quick gist I whipped up that I think does the right things:
https://gist.github.com/jessegrabowski/d3ea37e521dc3e63d89864d0bbcf54e0
Wow! Cool! I will take a look at it in detail!
@jessegrabowski Here (https://github.com/pymc-devs/pymc-examples/pull/642/commits/c842de35f25fa558b49504955da66f63f2d476f1) is my first attempt. I think it looks reasonable. Did I miss something :) ?