pymc
pymc copied to clipboard
Port timeseries distributions to V4
import pymc3 as pm
import numpy as np
returns = pd.read_csv(pm.get_data("SP500.csv"), index_col="Date")
returns["change"] = np.log(returns["Close"]).diff()
returns = returns.dropna().astype('float32')
returns.head()
with pm.Model() as model_pymc3:
step_size = pm.Exponential("step_size", 10.)
volatility = pm.GaussianRandomWalk("volatility", sigma=step_size,
size=returns.shape[0],
init=pm.Normal.dist(0, step_size))
nu = pm.Exponential("nu", 0.1)
obs = pm.StudentT(
"obs", nu=nu, sigma=np.exp(volatility), observed=returns["change"]
)
yields:
TypeError Traceback (most recent call last)
<ipython-input-11-3f1576c8cf84> in <module>
1 with pm.Model(check_bounds=False) as model_pymc3:
2 step_size = pm.Exponential("step_size", 10.)
----> 3 volatility = pm.GaussianRandomWalk("volatility", sigma=step_size,
4 size=returns.shape[0],
5 init=pm.Normal.dist(0, step_size))
~/projects/pymc/pymc3/distributions/distribution.py in __new__(cls, name, *args, **kwargs)
161 transform = kwargs.pop("transform", UNSET)
162
--> 163 rv_out = cls.dist(*args, rng=rng, **kwargs)
164
165 return model.register_rv(rv_out, name, data, total_size, dims=dims, transform=transform)
TypeError: dist() missing 1 required positional argument: 'dist_params'
Seems related to pm.GaussianRandomWalk because if I comment out the init it leads to the same error.
Is this related to check_bounds and/or V4?
This is on V4, worked fine on V3.
I could be wrong but I don't think the timeseries were refactored yet
Unrelated to check_bounds. I updated the code to make it runnable.
I see.
This is now possible, and much simpler, using the aesara.scan operator. For example, a GRW looks like this:
with pm4.Model():
random_walk, _ = aesara.scan(
fn=lambda Y_tm1: Y_tm1 + pm4.Normal.dist(sigma=.1),
outputs_info=[{"initial": at.as_tensor([0.0])}],
n_steps=100,
name="Y"
)
@fonnesbeck is this still something you want to take on? Definitely we could have a coordinated effort to rewrite the existing ones one-by-one.
Actually there is still some work required to make the above work with PyMC.
@mjhajharia and i would like to be involved in time series porting efforts. Admittedly I've been slow on this but very much would like to learn (and not get in the way too much) if someone else takes the lead on this.
Also big thanks to @ricardoV94 whos already spent many hours talking this through with us
+1 at what @canyon289 said, we talked to ricardo and planned on first porting random walk which can probably be done with a cumsum and then AR, followed by ARMA a lot of which is already there in the ARMA branch. I've set some time aside for this and will get started next week (as holidays get over soon), i'll keep updating things here in this project
@ricardoV94 already made a minimal example to explain aeppl and scan which might be useful for others seeing this issue
Just encountered this with EulerMaruyama too. I guess most (if not all) ts distributions are affected.
Sorry if this question should've been researched more thoroughly from my side, but... is there anything in PyMC v4 that will make life simpler for folks to want to do out-of-sample predictions on time series?
I've always found the shape changes to be problematic, even with shared variables (pm.Data), because the Timeseries distribution traces essentially save the input series. 🙁
Do we have a list of time series distributions already ported to v4 (I'm updating the release notes)
GaussianRandomWalk and AR I believe. Missing are GARCH11 and some other weird one I forgot the name of.
I've been working a lot recently with pymc3, just transitioned to pymc 4.0.1 and am very happy with the new version of GaussianRandomWalk. It can handle mutliple dimensions better then previously. Thanks for that :)
I figured out by trial and error and very helpful error messages that the time dimension always has to be specified last. This wasn't clear to me from the documentation, so perhaps I didn't know where to look? But in case it is not mentioned, I think it would be a great hint for inexperienced users like myself.
Are there any plans to transfer MVRandomWalk soon? I would be interested in contributing to that, but don't know where to start.
hey @flo-schu, my apologies but for this ticket is claimed by a GSOC proect. I should have updated it to reflect that. We would love for you to contribute elsewhere in the codebase though, would it help If I suggested something?
Assigning to myself since @nicospinu handle is not showing up for some reason cc @junpenglao
hey @flo-schu, my apologies but for this ticket is claimed by a GSOC proect. I should have updated it to reflect that. We would love for you to contribute elsewhere in the codebase though, would it help If I suggested something?
No problem. I'm happy to hear that the issue is being worked on. I'd be happy if you could point out some beginner-friendly opportunities for contribution
I've been working a lot recently with pymc3, just transitioned to pymc 4.0.1 and am very happy with the new version of GaussianRandomWalk. It can handle mutliple dimensions better then previously. Thanks for that :)
I figured out by trial and error and very helpful error messages that the time dimension always has to be specified last. This wasn't clear to me from the documentation, so perhaps I didn't know where to look? But in case it is not mentioned, I think it would be a great hint for inexperienced users like myself.
Are there any plans to transfer MVRandomWalk soon? I would be interested in contributing to that, but don't know where to start.
FWIW, I encountered the same issue in numpyro, which also gives unexpected results with numpyro.plate if the time dimension is not rightmost. I imagine it just trickles up to pymc4. It is also not obvious from numpyro documentation, I believe. A workaround I use in numpyro is to reshape before and after.
With #6072 we should be able to quickly refactor the multivariate random walks. After that I would suggest closing this issue and opening one mentioning the specific distributions that are still missing.
I wrote the EulerMaruyama implementation and interested to update it (and include some improvements for non-centered form & a 2nd order scheme as well). It's not clear to me why it needs porting though, since it's just some tensor expressions (at least for the centered EM method)?
The aim of porting is mostly so that the distribution is compatible under v4 (currently it is broken as we change the initialization signature and dispatching). Functionally, porting to v4 adds better support for forward sampling (otherwise in the previous implementation only pm.sample works but not prior/posterior predictive sampling
@junpenglao thanks for the explanation. Indeed the port is much more complicated than I expected, so thanks for your work on that.
@maedoc You are welcome! I am looking forward to the improvements for non-centered form & 2nd order scheme you mentioned ;-)