math Laplace Bug when passing Eigen::Map in tuple of functor arguments

From @avehtari on discourse

I'm trying to simplify the result here but have not been successful yet. i.e. this gives correct values. @WardBrian can you see anything here that is meaningfully different than what the compiler generates?

https://github.com/stan-dev/math/blob/e53e95b7fcdfad20e1d0e7df24333e4d5a8b5510/test/unit/math/laplace/aki_ex_test.cpp

Jun 13 '25 21:06 SteveBronder

@SteveBronder I think you misread Aki's post, the data you've hardcoded in here is what works.

The other data, like datat <- list(N=1, y=183, mu=0.5, sigmaz=2.5, run_laplace=0), causes problems

Jun 13 '25 22:06 WardBrian

Oh I did misread it my apologies

Jun 13 '25 22:06 SteveBronder

Changing the data to

  const std::vector<int>    y{183};
  const std::vector<double> mu{0.5};
  const double sigmaz = 2.5;

does indeed lead to the test failing with an exception thrown

C++ exception with description "Error in laplace_marginal_density: theta contains NaN values for all values." thrown in the test body.

Jun 16 '25 13:06 WardBrian

By adding some prints to the likelihood it seems the issue here is that theta is shooting off to massive values in the optimization:

[0]
[0]
[100.265]
[100.265]
[100.265]
[4.78573e+30]
[4.78573e+30]
[4.78573e+30]

Exception: Error in laplace_marginal_density: theta contains NaN values for all values.

Inside integrand, the value of theta never exceeds ~7000 it seems

If you play around with theta_0, you can find values that work (for example, { "N": 1, "y": [183], "mu": [0.5], "sigmaz": 2.5, "run_laplace": 1, "theta_0": [3] }) seems fine! Theta never gets larger than 7.5 ), but most of them shoot off into either massively positive or massively negative numbers and then you get the exception

cc @avehtari

Jun 16 '25 14:06 WardBrian

I added print also for the log likelihood. Running with

datat <- list(N=1, y=183, mu=0.5, sigmaz=2.5, run_laplace=1)

interestingly theta does come back from e+31 range

Chain 1 [0] 
Chain 1 -684.009 
Chain 1 [0] 
Chain 1 -684.009 
Chain 1 [100.265] 
Chain 1 -5.77624e+43 
Chain 1 [100.265] 
Chain 1 -5.77624e+43 
Chain 1 [100.265] 
Chain 1 -5.77624e+43 
Chain 1 [-1.58456e+31] 
Chain 1 -2.89975e+33 
Chain 1 [-1.58456e+31] 
Chain 1 -2.89975e+33 
Chain 1 [-1.58456e+31] 
Chain 1 -2.89975e+33 
Chain 1 [1143.75] 
Chain 1 -inf 
Chain 1 [1143.75] 
Chain 1 -inf 
Chain 1 [1143.75] 
Chain 1 -inf

One thing the optimization should do is that when ever the log likelihood is evaluated to be -inf or nan, the step should be halved. I'll try to have a look at the current code.

Jun 22 '25 09:06 avehtari

I had a quick look at the code in laplace_marginal_density.hpp. It could be made safer by checking that every time when laplace_likelihood::log_likelihood() or laplace_likelihood::diff() are called, the return value is finite (not -inf or nan) and if a non-finite return value is encountered then the step size is halved. Now it is possible to end up with theta which has non-finite log likelihood and/or derivative.

Jun 23 '25 12:06 avehtari

the step should be halved.

Reading the current code, it looks like the stepsize is always 0.5 -- is that what you were referring to @avehtari?

There used to be a comment from Charles about changing this but it looks like it got deleted at some point https://github.com/stan-dev/math/blob/a23525a32aaa6652242ba45534ef80f516142e85/stan/math/mix/functor/laplace_marginal_density.hpp#L219

Jul 09 '25 20:07 WardBrian

@avehtari @WardBrian I have a branch here showing what is happening. From the debugging log below we can see that with a mean of 0.5 the gradient for theta shoots off to 180'ish which then causes a and b for updating theta to become huge. That's an issue because in the poisson_log function we take an exponential of that value.

__________
ITER: 0 / 100
diff theta gradient: 181.351
curr theta: 
	0
W:  
	Nonzero entries:
	(1.64872,0) 
	Outer pointers:
	0  $
	1.64872

W_r:  
	1.28403
theta_grad:  
	181.351
a: 
	16.0424
b: 
	181.351
--------------
theta: 
	100.265
--------------
objective_old: 
	-1.79769e+308
objective_new: 
	-5.77624e+43
__________
ITER: 1 / 100
diff theta gradient: 
	-5.77624e+43
curr theta: 
	100.265
W:  
Nonzero entries:
(5.77624e+43,0) 
Outer pointers:
0  $
5.77624e+43

W_r:  
	7.60016e+21
theta_grad:  
	-5.77624e+43
a: 
	-2.5353e+30
b: 
	5.73378e+45

This ran went of to infinity in the objective, but other runs would just go off to some huge but wrong number.

I changed laplace here so that we take a partial step in the direction of the new a instead of a full step. We try a new partial step and if that fails to give us a better theta we halve the step size and keep trying until we to a step size of 1e-8 or find a better theta. This seems to be better and the new version can solve this problem with a step size of 0.9. @avehtari do you know of a better way to fix this?

If we do this step approach do you both think we should expose step_size to the user in tol so they can control it?

Jul 17 '25 16:07 SteveBronder

When you say "halve the step size and keep trying", from where do you keep going? The original point, or the point immediately before failure?

Jul 17 '25 16:07 WardBrian

The point immediately before failure. So same a and b, but a smaller step size for the update.

Jul 17 '25 16:07 SteveBronder

I would expect it to perform better if it was trying again from the original point, because then you're not having to compensate for the initial too-large-but-not-yet-failing steps. Is that hard to try?

Jul 17 '25 16:07 WardBrian

For this problem that would not work because it can hit a very large number that is still wrong but not infinite.***

***EDIT: Let me try this as it's not that hard to change. But I think it will not catch the edge case^. Just to be clear, you mean completely restarting the algorithm with a smaller stepsize?

@charlesm93 do you have any thoughts on this? Is there a way we can add something like the Armijo rule to dynamically change the step size?

Jul 17 '25 16:07 SteveBronder

Just to be clear, you mean completely restarting the algorithm with a smaller stepsize?

More or less, yeah. What it sounds like is your attempt leads to trajectories like

start ------------------- 'ok' intermediate ------------------- bad
start ------------------- 'ok' intermediate -------- maybe good?

(where the number of - is illustrating the step size)

But I think it would be interesting if the re-tries looked like

start -------- new intermediate -------- ...

Jul 17 '25 16:07 WardBrian

I meant that, the current line search checks only the objective but not the gradient. Any line search should check that the point selected has finite objective and finite gradient. I think the line search can stay the same, except for checking both objective and gradient and in case a non-finite objective or gradient is observed, the line search would only keep halving the step until finding finite objective and gradient. Not doing a complete line search is kind of safe mode, so that if we did see problems it's better to not try to find the "exact" minimum in the dangerous direction ("here be dragons"). The next iteration can be as usual. Also now the line search is behind an option for how many line search steps would be made, but even if the line search iterations is 0, we should do the step halving until finding finite objective and gradient. I can talk more next week, and also have a zoom call if needed.

Jul 18 '25 07:07 avehtari

I think the line search can stay the same, except for checking both objective and gradient and in case a non-finite objective or gradient is observed, the line search would only keep halving the step until finding finite objective and gradient.

Related issue is in lbfgs see https://github.com/stan-dev/stan/pull/3309

Jul 18 '25 14:07 spinkney

Update: I have a branch here that replaces the simpler line search we had with the wolfe line search for finding a nice step size. I need to do a little more testing but this fixes Aki's example and I think should make everything a bit more robust.

Jul 22 '25 00:07 SteveBronder

@SteveBronder, great! Would you like me to test it?

Jul 22 '25 18:07 avehtari

I'd love that! atm I'm trying to resolve some issues with the second solver and the gp motorcycle problem. I will ping you when we have a version ready

Jul 23 '25 16:07 SteveBronder

@avehtari @mitzimorris can you both try this branch of cmdstan I have setup

https://github.com/stan-dev/cmdstan/tree/laplace_test

I always get annoyed by github submodules but I think you should be able to copy paste the below to get all the submodules with the branches correct for the branch of stan math

# assuming you are in a cmdstan repo
git pull
git fetch origin laplace_test

# 4. Check it out locally 
git switch laplace_test

# 5. Sync all submodules to the exact commits recorded in the branch
git submodule update --init --recursive

# 6. (Optional) Verify everything is on the expected commits
# should be  aad582ba0221f140af2487922f27bd9868bd0d98
git submodule status

# just to make sure math updated go down and pull the branch
cd stan/lib/stan_math/
git pull origin fix/laplace-line-search
cd ../../../

After that you should be able to just run the standard make build commands to make cmdstan with the new stan math branch

# from the top folder of cmdstan repo
make clean-all
make -j4 build

If you cd into ./stan/lib/stan_math/ it should have commit hash bae4dead234d9c763ffcb9a379831bbd3b2194f8

If you have any issues pulling from the above please lmk!

Jul 28 '25 20:07 SteveBronder

@SteveBronder does your branch include the new function signature?

Jul 29 '25 15:07 avehtari

@avehtari are you referring to the mean arguments to the helpers? It should. The instructions steve posted should also download the nightly build of stanc which will know about them

Jul 29 '25 16:07 WardBrian

I ran the new branch on a spatial GP model with approx 1600 data points. there is some improvement - the sampler crawls instead of hanging.

@avehtari - this is the function sig that works for me:

  target += laplace_marginal(spatial_likelihood, 
                           (y, mu_fixed, sigma_obs),
                           spatial_covariance,
                           (coords_vec, sigma_spatial, length_scale));

Jul 29 '25 16:07 mitzimorris

@mitzimorris could you send me your model? I will take a look! It could be that the laplace values start far away and have to do a lot of iterations to get reasonable theta parameters.

Jul 29 '25 16:07 SteveBronder

I got it working, and no more nans or infs, but still giving sometimes wrong answer. In the roaches Poisson varying coeffcient model there are still 10 out of 258 problematic observations. From these, 3 observations are such that the errors are very big. For all these, the error is big only for some parameter values (7% to 42% of posterior draws). You can get big errors, for example with these

y=183, mu= 0.6, sigmaz=2.0
y= 91, mu=-0.2, sigmaz=2.0
y=171, mu=-0.6, sigmaz=2.0

The correct values are about -9.301779 -8.575111 -11.049838, but the laplace_marginal returns -670.6946 -342.7156 -809.0462.

I did not yet try playing with the tolerances

Jul 29 '25 16:07 avehtari

The simplified roaches model for testing is the same as in https://discourse.mc-stan.org/t/embedded-laplace-numerical-problem/39700

Jul 29 '25 17:07 avehtari

Thanks Aki! I will check this out

Jul 29 '25 17:07 SteveBronder

So I just updated the example with new data and, at least running laplace for each data point, it looks like it is having trouble with the first set of y and mu.

Laplace result: -1621.34
Integrated result: -27.6813
results for y,mu[0]: (183, 0.6): 
	laplace:    -666.94
	integrated: -9.47139
	difference: -657.468
results for y,mu[1]: (91, -0.2): 
	laplace:    -8.88515
	integrated: -8.88423
	difference: -0.000916134
results for y,mu[2]: (171, 0.6): 
	laplace:    -9.32621
	integrated: -9.32572
	difference: -0.000487337

let me look into this and get back to you. It may just be an issue with bad tuning parameters for the search

Jul 29 '25 17:07 SteveBronder

I examined which posterior draws lead to big error with that y=183. The color indicates the absolute error, but it is either big or small

Jul 29 '25 17:07 avehtari

With other observations, there are similar stripes with varying width in the parameter space which lead to big error

Jul 29 '25 17:07 avehtari

Ty! So I updated c1 and c2 for the wolfe line search and it looks like these are giving more sensible values now. Maybe those bands have an odd curvature?

Laplace result: -27.6832
Integrated result: -27.6813
results for y,mu[0]: (183, 0.6): 
	laplace:    -9.47185
	integrated: -9.47139
	difference: -0.000455425
results for y,mu[1]: (91, -0.2): 
	laplace:    -8.88515
	integrated: -8.88423
	difference: -0.000916134
results for y,mu[2]: (171, 0.6): 
	laplace:    -9.32621
	integrated: -9.32572
	difference: -0.000487337

If you have the branch of cmdstan already checked out try the below

git switch laplace_test           # or: git checkout laplace_test

# 1-liner (Git ≥ 2.14) that fetches, merges, and fixes submodules
git pull --recurse-submodules
make clean-all
make -j4 build

If that doesn't work for some reason you can use the below to update to the stan and stan math commits directly

# from the cmdstan repo
cd stan
git pull origin fix/laplace-line-search
git checkout 4c3ca5553f339f0194e2f3d2abbee3ea22af997a
cd ./lib/stan_math
git pull origin fix/laplace-line-search
git checkout be71fc41528a1398c63a56d343973ac66556dab2
cd ../../../
make clean-all
make -j4 build

Jul 29 '25 18:07 SteveBronder