pymc-experimental icon indicating copy to clipboard operation
pymc-experimental copied to clipboard

Is P0 the initial state covariance matrix or the square root of the initial state covariance matrix?

Open rklees opened this issue 1 year ago • 7 comments

I wonder if someone could clarify what P0 really is. For instance, in the example notebooks to pymc-experimental , I found the following statement:

P0_diag = pm.Gamma("P0_diag", alpha=2, beta=5, dims=P0_dims[0]) P0 = pm.Deterministic("P0", pt.diag(P0_diag), dims=P0_dims)

A Gamma distribution is usually used as a prior for a standard deviation, not for a variance. So, does P0 as used above is a covariance matrix or the square-root of a covariance matrix?

Moreover, in structural.py, I frequently find statements like the following for a cycle:

if self.innovations: sigma_cycle = self.make_and_register_variable(f"sigma_{self.name}", shape=(1,)) self.ssm["state_cov", :, :] = pt.eye(self.k_posdef) * sigma_cycle

So the name "state_cov" suggests that I define a variance, however, the right-hand side suggests that I define a standard deviation (sigma_cycle is the standard deviation of the cycle disturbance).

Any help would be appreciated.

rklees avatar Apr 05 '24 13:04 rklees

P0 really is the full initial hidden states covariance matrix. In the code snippet above, P0_diag is a vector of variances. You are compeltely free to parameterize P0 as you like, up to and including estimation of the full dense matrix using e.g. pm.LKJCholeskyCov. In my testing, however, I have found that P0 tends to be only weakly identified, if identified at all. The typical thing to do is to use a "diffuse" initialization -- in statsmodels, they set non-stationary elements of P0 to be 1e6. You're free to do this, but I find the effect on the initial state variance to be way too strong, and even when using Kalman Smoothing it ruins the output plots. Plus there's really no justification for such a huge variance prior.

In the future, I'd like to hide P0 from users, and initialize it in a blockwise fashion, using stationary initialization where possible. It's on my to-do list, but I haven't had a lot of time to work on the statespace stuff recently.

As for sigma_cycle, any variable named "sigma" is a standard deviation. This was fixed in a previous PR that was already merged, you can check that the line you referenced is correctly squared on main. It's possible I missed some though, so let me know if you find anything that is not being correctly squared (or better yet, open a PR that adds the square!)

jessegrabowski avatar Apr 05 '24 13:04 jessegrabowski

Dear Jesse,

Thanks for he clarification.

Your remark about approximate diffuse initialization, which is used in the frequentist approach to SSM does not really apply to Bayesian SSM, unless I define the diagonal elements of P0 (i.e., the initial state vector variances) from a prior which allows huge mean (or, mode) values. For instance when using a gamma distribution as prior, alpha/beta needs to be something like 1e6.

By the way, the approximate diffuse initialization means in fact that the initial state and covariance do not have any influence on the estimated (filtered) states already with epoch 1, as the Kalman gain K_1 ~ 1, and the filtered state is determined by the observation y_1 (for a local level model as an example, but similar applies for other models, too).

If you allow me there is another strange thing in the code of pymc-experimential.

When I define a cycle component, the state names and shock names are “cycle_Sin” and “cycle_Cos” (in that sequence). However, I can also define a cycle with as seasonal in the frequency domain with n=1. Then, however, the state names and shock names are “cycle_Cos_0” and “cycle_Sin_0”, i.e., the cosine component comes first and then the sine component.

This is inconsistent on its own (the first component should always be the cosine component followed by the sine component). However, it becomes even more confusing, when I define a SSM in pymc-experimental to simulate data. Then, I do for instance

cycle1 = st.CycleComponent(name="annual_cycle", cycle_length=12, innovations=True) cycle2 = st.FrequencySeasonality(name="SA_cycle", season_length=5.347, n=1, innovations=True)

but in param_dict, the synthax is

"cycle1": np.array([10., 0.]), "sigma_cycle1": np.array([1e-6]) “cycle2”: np.array([20.,10.]), “sigma_cycle2”: np.array([0.3])

That is, the first element in np.array is ALWAYS the cosine amplitude (here 10. for cycle1 and 20. for cycle2) and the second element is ALWAYS the sine amplitude (here 0. for cycle1 and 10. for cycle 2) no matter whether the cycle was defined with st.CycleComponente or with st.FrequencySeasonality. Of course, then cycle1 is a pure cosine wave (starting with 10. at the first epoch), whereas cycle2 is a phase-shifted cosine wave.

rklees avatar Apr 05 '24 14:04 rklees

I'm not sure I follow your point about P0 not mattering after the first update. x0 and P0 are taken to be predictions $x_{1|0}$ and $P_{1|0}$ (following Durbin and Koopsman), so the first update equation for the hidden state covariance is:

$$P_{1|1} = P_{1 | 0} - P_{1 | 0} Z^T (Z P_{1 | 0} Z^T + H)^{-1} Z P_{1 | 0}$$

So the filtered hidden states covariances clearly depend on the initial states. This is clear to see by playing with large diagonal values of P0.

The state names for Cycle and FrequencySeasonality are basically nonsense right now. If you actually look at how the cycle components and the frequency components are defined, you can see that they're not sine and cosine components, but mixtures. I'm open to suggestions on the names. Statsmodels doesn't even try; you just get back "state.1", "state.2", etc. That might be the best we can do here as well, although I would strongly prefer something else.

For the last point, I'd need to see full code for your final example. It shouldn't run as written, because the expected names for the variables are "annual_cycle and SA_cycle, not cycle1 and cycle2. The helper function unpack_symbolic_matrices_with_params matches on the name of the symbolic variable. Related to the above comment about what the states are, I don't think the first state will be a pure cosine amplitude? It should be $\gamma_0$, used to compute:

$$\gamma_1= \rho \gamma_0 \cos \left ( \frac{2\pi}{s} \right ) + \rho \gamma_{t-1}^\star \sin \left ( \frac{2\pi}{s} \right )+ \omega_{t}$$

If the second element ($\gamma^\star_0$) is zero you'll get a pure cosine function, but otherwise it will be a mixture. What should it be?

jessegrabowski avatar Apr 07 '24 14:04 jessegrabowski

Maybe I was not clear in my formulation.

Suppose the cycle is y(t) = C cos(2pi f t – phi), with f the linear frequency and phi the phase. Then, y(t) = A cos(2 pi f t) + B sin(2 pi f t). When I simulate data using pymc-experimental, I do

cycle = st. CycleComponent(name=’cycle’, cycle_length = 1/f)

and

param_dict = { ‘cycle’: np.array( [A0, B0]), …}

that is the first component in np.array is the initial value of A and the second component is the initial value of B. If B0 = 0, the phase phi = 0 (i.e., pure cosine wave) ; if A0 = 0, the phase = pi/2 (pure sine wave). However, in shocks_names and state_names I get

[‘cycle_Sin’, ‘cycle_Cos’]

though it should be [‘cycle_Cos’, ‘cycle_Sin’].

I get the correct naming when defining the cycle using st.FrequencySeasonality.

Roland K

From: Jesse Grabowski @.> Reply to: pymc-devs/pymc-experimental @.> Date: Sunday, 7 April 2024 at 16:21 To: pymc-devs/pymc-experimental @.> Cc: Roland Klees @.>, Author @.***> Subject: Re: [pymc-devs/pymc-experimental] Is P0 the initial state covariance matrix or the square root of the initial state covariance matrix? (Issue #325)

I'm not sure I follow your point about P0 not mattering after the first update. x0 and P0 are taken to be predictions $x_{1|0}$ and $P_{1|0}$ (following Durbin and Koopsman), so the first update equation for the hidden state covariance is:

$$P_{1|1} = P_{1 | 0} - P_{1 | 0} Z^T (Z P_{1 | 0} Z^T + H)^{-1} Z P_{1 | 0}$$

So the filtered hidden state predictions clearly depend on the initial states. This is clear to see by playing with large diagonal values of P0.

The state names for Cycle and FrequencySeasonality are basically nonsense right now. If you actually look at how the cycle componentshttps://www.pymc.io/projects/experimental/en/latest/statespace/models/generated/pymc_experimental.statespace.models.structural.CycleComponent.html and the frequency componentshttps://www.pymc.io/projects/experimental/en/latest/statespace/models/generated/pymc_experimental.statespace.models.structural.FrequencySeasonality.html are defined, you can see that they're not sine and cosine components, but mixtures. I'm open to suggestions on the names. Statsmodels doesn't even try; you just get back "state.1", "state.2", etc. That might be the best we can do here as well, although I would strongly prefer something else.

For the last point, I'd need to see full code for your final example. It shouldn't run as written, because the expected names for the variables are "annual_cycle and SA_cycle, not cycle1 and cycle2. The helper function unpack_symbolic_matrices_with_params matches on the name of the symbolic variable. Related to the above comment about what the states are, I don't think the first state will be a pure cosine amplitude? It should be $\gamma_0$, used to compute $\gamma_1= \rho \gamma_0 \cos \left ( \frac{2\pi}{s} \right ) + \rho \gamma_{t-1}^\star \sin \left ( \frac{2\pi}{s} \right )+ \omega_{t} $ If the second element ($\gamma^\star_0$) is zero you'll get a pure cosine function, but otherwise it will be a mixture. What should it be?

— Reply to this email directly, view it on GitHubhttps://github.com/pymc-devs/pymc-experimental/issues/325#issuecomment-2041486573, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AWKBIZJEPZAOO7RXCKS2UO3Y4FI5DAVCNFSM6AAAAABFZEEXV2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANBRGQ4DMNJXGM. You are receiving this because you authored the thread.Message ID: @.***>

rklees avatar Apr 07 '24 15:04 rklees

Sure, we can change the order of the names as a short term solution. But I still think they are bad names, before if you set {"cycle_init":[A0, B0]} with neither A0 or B0 zero, the state is not a cosine or a sine, but a weighted combination of the two. I think it would be better if we could think up a more descriptive name, or at least a less deceptive one.

jessegrabowski avatar Apr 07 '24 16:04 jessegrabowski

If you use initial_trend, then any other component of the state vector should also start with initial_. Maybe Initial_cos and initial_sin to keep it short.

Roland K

From: Jesse Grabowski @.> Reply to: pymc-devs/pymc-experimental @.> Date: Sunday, 7 April 2024 at 18:41 To: pymc-devs/pymc-experimental @.> Cc: Roland Klees @.>, Author @.***> Subject: Re: [pymc-devs/pymc-experimental] Is P0 the initial state covariance matrix or the square root of the initial state covariance matrix? (Issue #325)

Sure, we can change the order of the names as a short term solution. But I still think they are bad names, before if you set {"cycle_init":[A0, B0]} with neither A0 or B0 zero, the state is not a cosine or a sine, but a weighted combination of the two. I think it would be better if we could think up a more descriptive name, or at least a less deceptive one.

— Reply to this email directly, view it on GitHubhttps://github.com/pymc-devs/pymc-experimental/issues/325#issuecomment-2041525459, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AWKBIZJGV2H64BGZHGPL4QDY4FZMVAVCNFSM6AAAAABFZEEXV2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANBRGUZDKNBVHE. You are receiving this because you authored the thread.Message ID: @.***>

rklees avatar Apr 07 '24 17:04 rklees

I would prefer to use “initial_state” and “initial_state_cov” as names. The order of the components is as given in state_names. For a cycle C cos(2 pi f t – phi) = C cos(phi) cos(2 pi f t) + C sin(phi) sin(2 pi f t) = I cos (2 pi f t) + Q sin(2 pi f t), I would use cycle_I_amp and cycle_Q_amp as names for I (the amplitude of the in-phase component) and Q (the amplitude of the quadrature component); alternatively, you may use cycle_I and cycle_Q (i.e., leave out “_amp”).

Roland K

From: Jesse Grabowski @.> Reply to: pymc-devs/pymc-experimental @.> Date: Sunday, 7 April 2024 at 18:41 To: pymc-devs/pymc-experimental @.> Cc: Roland Klees @.>, Author @.***> Subject: Re: [pymc-devs/pymc-experimental] Is P0 the initial state covariance matrix or the square root of the initial state covariance matrix? (Issue #325)

Sure, we can change the order of the names as a short term solution. But I still think they are bad names, before if you set {"cycle_init":[A0, B0]} with neither A0 or B0 zero, the state is not a cosine or a sine, but a weighted combination of the two. I think it would be better if we could think up a more descriptive name, or at least a less deceptive one.

— Reply to this email directly, view it on GitHubhttps://github.com/pymc-devs/pymc-experimental/issues/325#issuecomment-2041525459, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AWKBIZJGV2H64BGZHGPL4QDY4FZMVAVCNFSM6AAAAABFZEEXV2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANBRGUZDKNBVHE. You are receiving this because you authored the thread.Message ID: @.***>

rklees avatar Apr 08 '24 07:04 rklees