Laplace Bug when passing Eigen::Map in tuple of functor arguments
I'm trying to simplify the result here but have not been successful yet. i.e. this gives correct values. @WardBrian can you see anything here that is meaningfully different than what the compiler generates?
https://github.com/stan-dev/math/blob/e53e95b7fcdfad20e1d0e7df24333e4d5a8b5510/test/unit/math/laplace/aki_ex_test.cpp
@SteveBronder I think you misread Aki's post, the data you've hardcoded in here is what works.
The other data, like datat <- list(N=1, y=183, mu=0.5, sigmaz=2.5, run_laplace=0), causes problems
Oh I did misread it my apologies
Changing the data to
const std::vector<int> y{183};
const std::vector<double> mu{0.5};
const double sigmaz = 2.5;
does indeed lead to the test failing with an exception thrown
C++ exception with description "Error in laplace_marginal_density: theta contains NaN values for all values." thrown in the test body.
By adding some prints to the likelihood it seems the issue here is that theta is shooting off to massive values in the optimization:
[0]
[0]
[100.265]
[100.265]
[100.265]
[4.78573e+30]
[4.78573e+30]
[4.78573e+30]
Exception: Error in laplace_marginal_density: theta contains NaN values for all values.
Inside integrand, the value of theta never exceeds ~7000 it seems
If you play around with theta_0, you can find values that work (for example, { "N": 1, "y": [183], "mu": [0.5], "sigmaz": 2.5, "run_laplace": 1, "theta_0": [3] }) seems fine! Theta never gets larger than 7.5 ), but most of them shoot off into either massively positive or massively negative numbers and then you get the exception
cc @avehtari
I added print also for the log likelihood. Running with
datat <- list(N=1, y=183, mu=0.5, sigmaz=2.5, run_laplace=1)
interestingly theta does come back from e+31 range
Chain 1 [0]
Chain 1 -684.009
Chain 1 [0]
Chain 1 -684.009
Chain 1 [100.265]
Chain 1 -5.77624e+43
Chain 1 [100.265]
Chain 1 -5.77624e+43
Chain 1 [100.265]
Chain 1 -5.77624e+43
Chain 1 [-1.58456e+31]
Chain 1 -2.89975e+33
Chain 1 [-1.58456e+31]
Chain 1 -2.89975e+33
Chain 1 [-1.58456e+31]
Chain 1 -2.89975e+33
Chain 1 [1143.75]
Chain 1 -inf
Chain 1 [1143.75]
Chain 1 -inf
Chain 1 [1143.75]
Chain 1 -inf
One thing the optimization should do is that when ever the log likelihood is evaluated to be -inf or nan, the step should be halved. I'll try to have a look at the current code.
I had a quick look at the code in laplace_marginal_density.hpp. It could be made safer by checking that every time when laplace_likelihood::log_likelihood() or laplace_likelihood::diff() are called, the return value is finite (not -inf or nan) and if a non-finite return value is encountered then the step size is halved. Now it is possible to end up with theta which has non-finite log likelihood and/or derivative.
the step should be halved.
Reading the current code, it looks like the stepsize is always 0.5 -- is that what you were referring to @avehtari?
There used to be a comment from Charles about changing this but it looks like it got deleted at some point https://github.com/stan-dev/math/blob/a23525a32aaa6652242ba45534ef80f516142e85/stan/math/mix/functor/laplace_marginal_density.hpp#L219
@avehtari @WardBrian I have a branch here showing what is happening. From the debugging log below we can see that with a mean of 0.5 the gradient for theta shoots off to 180'ish which then causes a and b for updating theta to become huge. That's an issue because in the poisson_log function we take an exponential of that value.
__________
ITER: 0 / 100
diff theta gradient: 181.351
curr theta:
0
W:
Nonzero entries:
(1.64872,0)
Outer pointers:
0 $
1.64872
W_r:
1.28403
theta_grad:
181.351
a:
16.0424
b:
181.351
--------------
theta:
100.265
--------------
objective_old:
-1.79769e+308
objective_new:
-5.77624e+43
__________
ITER: 1 / 100
diff theta gradient:
-5.77624e+43
curr theta:
100.265
W:
Nonzero entries:
(5.77624e+43,0)
Outer pointers:
0 $
5.77624e+43
W_r:
7.60016e+21
theta_grad:
-5.77624e+43
a:
-2.5353e+30
b:
5.73378e+45
This ran went of to infinity in the objective, but other runs would just go off to some huge but wrong number.
I changed laplace here so that we take a partial step in the direction of the new a instead of a full step. We try a new partial step and if that fails to give us a better theta we halve the step size and keep trying until we to a step size of 1e-8 or find a better theta. This seems to be better and the new version can solve this problem with a step size of 0.9. @avehtari do you know of a better way to fix this?
If we do this step approach do you both think we should expose step_size to the user in tol so they can control it?
When you say "halve the step size and keep trying", from where do you keep going? The original point, or the point immediately before failure?
The point immediately before failure. So same a and b, but a smaller step size for the update.
I would expect it to perform better if it was trying again from the original point, because then you're not having to compensate for the initial too-large-but-not-yet-failing steps. Is that hard to try?
For this problem that would not work because it can hit a very large number that is still wrong but not infinite.***
***EDIT: Let me try this as it's not that hard to change. But I think it will not catch the edge case^. Just to be clear, you mean completely restarting the algorithm with a smaller stepsize?
@charlesm93 do you have any thoughts on this? Is there a way we can add something like the Armijo rule to dynamically change the step size?
Just to be clear, you mean completely restarting the algorithm with a smaller stepsize?
More or less, yeah. What it sounds like is your attempt leads to trajectories like
start ------------------- 'ok' intermediate ------------------- bad
start ------------------- 'ok' intermediate -------- maybe good?
(where the number of - is illustrating the step size)
But I think it would be interesting if the re-tries looked like
start -------- new intermediate -------- ...
I meant that, the current line search checks only the objective but not the gradient. Any line search should check that the point selected has finite objective and finite gradient. I think the line search can stay the same, except for checking both objective and gradient and in case a non-finite objective or gradient is observed, the line search would only keep halving the step until finding finite objective and gradient. Not doing a complete line search is kind of safe mode, so that if we did see problems it's better to not try to find the "exact" minimum in the dangerous direction ("here be dragons"). The next iteration can be as usual. Also now the line search is behind an option for how many line search steps would be made, but even if the line search iterations is 0, we should do the step halving until finding finite objective and gradient. I can talk more next week, and also have a zoom call if needed.
I think the line search can stay the same, except for checking both objective and gradient and in case a non-finite objective or gradient is observed, the line search would only keep halving the step until finding finite objective and gradient.
Related issue is in lbfgs see https://github.com/stan-dev/stan/pull/3309
Update: I have a branch here that replaces the simpler line search we had with the wolfe line search for finding a nice step size. I need to do a little more testing but this fixes Aki's example and I think should make everything a bit more robust.
@SteveBronder, great! Would you like me to test it?
I'd love that! atm I'm trying to resolve some issues with the second solver and the gp motorcycle problem. I will ping you when we have a version ready
@avehtari @mitzimorris can you both try this branch of cmdstan I have setup
https://github.com/stan-dev/cmdstan/tree/laplace_test
I always get annoyed by github submodules but I think you should be able to copy paste the below to get all the submodules with the branches correct for the branch of stan math
# assuming you are in a cmdstan repo
git pull
git fetch origin laplace_test
# 4. Check it out locally
git switch laplace_test
# 5. Sync all submodules to the exact commits recorded in the branch
git submodule update --init --recursive
# 6. (Optional) Verify everything is on the expected commits
# should be aad582ba0221f140af2487922f27bd9868bd0d98
git submodule status
# just to make sure math updated go down and pull the branch
cd stan/lib/stan_math/
git pull origin fix/laplace-line-search
cd ../../../
After that you should be able to just run the standard make build commands to make cmdstan with the new stan math branch
# from the top folder of cmdstan repo
make clean-all
make -j4 build
If you cd into ./stan/lib/stan_math/ it should have commit hash bae4dead234d9c763ffcb9a379831bbd3b2194f8
If you have any issues pulling from the above please lmk!
@SteveBronder does your branch include the new function signature?
@avehtari are you referring to the mean arguments to the helpers? It should. The instructions steve posted should also download the nightly build of stanc which will know about them
I ran the new branch on a spatial GP model with approx 1600 data points. there is some improvement - the sampler crawls instead of hanging.
@avehtari - this is the function sig that works for me:
target += laplace_marginal(spatial_likelihood,
(y, mu_fixed, sigma_obs),
spatial_covariance,
(coords_vec, sigma_spatial, length_scale));
@mitzimorris could you send me your model? I will take a look! It could be that the laplace values start far away and have to do a lot of iterations to get reasonable theta parameters.
I got it working, and no more nans or infs, but still giving sometimes wrong answer. In the roaches Poisson varying coeffcient model there are still 10 out of 258 problematic observations. From these, 3 observations are such that the errors are very big. For all these, the error is big only for some parameter values (7% to 42% of posterior draws). You can get big errors, for example with these
y=183, mu= 0.6, sigmaz=2.0
y= 91, mu=-0.2, sigmaz=2.0
y=171, mu=-0.6, sigmaz=2.0
The correct values are about -9.301779 -8.575111 -11.049838, but the laplace_marginal returns -670.6946 -342.7156 -809.0462.
I did not yet try playing with the tolerances
The simplified roaches model for testing is the same as in https://discourse.mc-stan.org/t/embedded-laplace-numerical-problem/39700
Thanks Aki! I will check this out
So I just updated the example with new data and, at least running laplace for each data point, it looks like it is having trouble with the first set of y and mu.
Laplace result: -1621.34
Integrated result: -27.6813
results for y,mu[0]: (183, 0.6):
laplace: -666.94
integrated: -9.47139
difference: -657.468
results for y,mu[1]: (91, -0.2):
laplace: -8.88515
integrated: -8.88423
difference: -0.000916134
results for y,mu[2]: (171, 0.6):
laplace: -9.32621
integrated: -9.32572
difference: -0.000487337
let me look into this and get back to you. It may just be an issue with bad tuning parameters for the search
I examined which posterior draws lead to big error with that y=183. The color indicates the absolute error, but it is either big or small
With other observations, there are similar stripes with varying width in the parameter space which lead to big error
Ty! So I updated c1 and c2 for the wolfe line search and it looks like these are giving more sensible values now. Maybe those bands have an odd curvature?
Laplace result: -27.6832
Integrated result: -27.6813
results for y,mu[0]: (183, 0.6):
laplace: -9.47185
integrated: -9.47139
difference: -0.000455425
results for y,mu[1]: (91, -0.2):
laplace: -8.88515
integrated: -8.88423
difference: -0.000916134
results for y,mu[2]: (171, 0.6):
laplace: -9.32621
integrated: -9.32572
difference: -0.000487337
If you have the branch of cmdstan already checked out try the below
git switch laplace_test # or: git checkout laplace_test
# 1-liner (Git ≥ 2.14) that fetches, merges, and fixes submodules
git pull --recurse-submodules
make clean-all
make -j4 build
If that doesn't work for some reason you can use the below to update to the stan and stan math commits directly
# from the cmdstan repo
cd stan
git pull origin fix/laplace-line-search
git checkout 4c3ca5553f339f0194e2f3d2abbee3ea22af997a
cd ./lib/stan_math
git pull origin fix/laplace-line-search
git checkout be71fc41528a1398c63a56d343973ac66556dab2
cd ../../../
make clean-all
make -j4 build