pymc Port timeseries distributions to V4

import pymc3 as pm
import numpy as np

returns = pd.read_csv(pm.get_data("SP500.csv"), index_col="Date")
returns["change"] = np.log(returns["Close"]).diff()
returns = returns.dropna().astype('float32')
returns.head()

with pm.Model() as model_pymc3:
    step_size = pm.Exponential("step_size", 10.)
    volatility = pm.GaussianRandomWalk("volatility", sigma=step_size, 
                                       size=returns.shape[0], 
                                       init=pm.Normal.dist(0, step_size))
    nu = pm.Exponential("nu", 0.1)
    obs = pm.StudentT(
        "obs", nu=nu, sigma=np.exp(volatility), observed=returns["change"]
    )

yields:

TypeError                                 Traceback (most recent call last)
<ipython-input-11-3f1576c8cf84> in <module>
      1 with pm.Model(check_bounds=False) as model_pymc3:
      2     step_size = pm.Exponential("step_size", 10.)
----> 3     volatility = pm.GaussianRandomWalk("volatility", sigma=step_size, 
      4                                        size=returns.shape[0],
      5                                        init=pm.Normal.dist(0, step_size))

~/projects/pymc/pymc3/distributions/distribution.py in __new__(cls, name, *args, **kwargs)
    161         transform = kwargs.pop("transform", UNSET)
    162 
--> 163         rv_out = cls.dist(*args, rng=rng, **kwargs)
    164 
    165         return model.register_rv(rv_out, name, data, total_size, dims=dims, transform=transform)

TypeError: dist() missing 1 required positional argument: 'dist_params'

Apr 14 '21 14:04 twiecki

Seems related to pm.GaussianRandomWalk because if I comment out the init it leads to the same error.

Apr 14 '21 14:04 twiecki

Is this related to check_bounds and/or V4?

Apr 14 '21 15:04 ricardoV94

This is on V4, worked fine on V3.

Apr 14 '21 15:04 twiecki

I could be wrong but I don't think the timeseries were refactored yet

Apr 14 '21 15:04 ricardoV94

Unrelated to check_bounds. I updated the code to make it runnable.

Apr 14 '21 15:04 twiecki

I see.

Apr 14 '21 15:04 twiecki

This is now possible, and much simpler, using the aesara.scan operator. For example, a GRW looks like this:

with pm4.Model():
    random_walk, _ = aesara.scan(
        fn=lambda Y_tm1: Y_tm1 + pm4.Normal.dist(sigma=.1),
        outputs_info=[{"initial": at.as_tensor([0.0])}],
        n_steps=100,
        name="Y"
    )

@fonnesbeck is this still something you want to take on? Definitely we could have a coordinated effort to rewrite the existing ones one-by-one.

Dec 28 '21 19:12 twiecki

Actually there is still some work required to make the above work with PyMC.

Dec 28 '21 19:12 twiecki

@mjhajharia and i would like to be involved in time series porting efforts. Admittedly I've been slow on this but very much would like to learn (and not get in the way too much) if someone else takes the lead on this.

Also big thanks to @ricardoV94 whos already spent many hours talking this through with us

Dec 28 '21 20:12 canyon289

+1 at what @canyon289 said, we talked to ricardo and planned on first porting random walk which can probably be done with a cumsum and then AR, followed by ARMA a lot of which is already there in the ARMA branch. I've set some time aside for this and will get started next week (as holidays get over soon), i'll keep updating things here in this project

Dec 28 '21 20:12 mjhajharia

@ricardoV94 already made a minimal example to explain aeppl and scan which might be useful for others seeing this issue

Dec 28 '21 20:12 mjhajharia

Just encountered this with EulerMaruyama too. I guess most (if not all) ts distributions are affected.

Dec 30 '21 02:12 sokol11

Sorry if this question should've been researched more thoroughly from my side, but... is there anything in PyMC v4 that will make life simpler for folks to want to do out-of-sample predictions on time series?

I've always found the shape changes to be problematic, even with shared variables (pm.Data), because the Timeseries distribution traces essentially save the input series. 🙁

Jan 14 '22 14:01 NowanIlfideme

Do we have a list of time series distributions already ported to v4 (I'm updating the release notes)

Jun 01 '22 13:06 AlexAndorra

GaussianRandomWalk and AR I believe. Missing are GARCH11 and some other weird one I forgot the name of.

Jun 01 '22 14:06 twiecki

I've been working a lot recently with pymc3, just transitioned to pymc 4.0.1 and am very happy with the new version of GaussianRandomWalk. It can handle mutliple dimensions better then previously. Thanks for that :)

I figured out by trial and error and very helpful error messages that the time dimension always has to be specified last. This wasn't clear to me from the documentation, so perhaps I didn't know where to look? But in case it is not mentioned, I think it would be a great hint for inexperienced users like myself.

Are there any plans to transfer MVRandomWalk soon? I would be interested in contributing to that, but don't know where to start.

Jun 23 '22 12:06 flo-schu

hey @flo-schu, my apologies but for this ticket is claimed by a GSOC proect. I should have updated it to reflect that. We would love for you to contribute elsewhere in the codebase though, would it help If I suggested something?

Assigning to myself since @nicospinu handle is not showing up for some reason cc @junpenglao

Jun 23 '22 12:06 canyon289

hey @flo-schu, my apologies but for this ticket is claimed by a GSOC proect. I should have updated it to reflect that. We would love for you to contribute elsewhere in the codebase though, would it help If I suggested something?

No problem. I'm happy to hear that the issue is being worked on. I'd be happy if you could point out some beginner-friendly opportunities for contribution

Jun 23 '22 12:06 flo-schu

I've been working a lot recently with pymc3, just transitioned to pymc 4.0.1 and am very happy with the new version of GaussianRandomWalk. It can handle mutliple dimensions better then previously. Thanks for that :)

I figured out by trial and error and very helpful error messages that the time dimension always has to be specified last. This wasn't clear to me from the documentation, so perhaps I didn't know where to look? But in case it is not mentioned, I think it would be a great hint for inexperienced users like myself.

Are there any plans to transfer MVRandomWalk soon? I would be interested in contributing to that, but don't know where to start.

FWIW, I encountered the same issue in numpyro, which also gives unexpected results with numpyro.plate if the time dimension is not rightmost. I imagine it just trickles up to pymc4. It is also not obvious from numpyro documentation, I believe. A workaround I use in numpyro is to reshape before and after.

Jun 23 '22 13:06 sokol11

With #6072 we should be able to quickly refactor the multivariate random walks. After that I would suggest closing this issue and opening one mentioning the specific distributions that are still missing.

Sep 10 '22 05:09 ricardoV94

I wrote the EulerMaruyama implementation and interested to update it (and include some improvements for non-centered form & a 2nd order scheme as well). It's not clear to me why it needs porting though, since it's just some tensor expressions (at least for the centered EM method)?

Oct 20 '22 20:10 maedoc

The aim of porting is mostly so that the distribution is compatible under v4 (currently it is broken as we change the initialization signature and dispatching). Functionally, porting to v4 adds better support for forward sampling (otherwise in the previous implementation only pm.sample works but not prior/posterior predictive sampling

Oct 20 '22 20:10 junpenglao

@junpenglao thanks for the explanation. Indeed the port is much more complicated than I expected, so thanks for your work on that.

Oct 21 '22 07:10 maedoc

@maedoc You are welcome! I am looking forward to the improvements for non-centered form & 2nd order scheme you mentioned ;-)

Oct 21 '22 08:10 junpenglao

pymc pymc copied to clipboard

Port timeseries distributions to V4

pymc
pymc copied to clipboard