stan icon indicating copy to clipboard operation
stan copied to clipboard

Provide access to initial values

Open jgabry opened this issue 5 years ago • 11 comments

We've recently discussed how CmdStan does not have a way to provide users access to the initial values it used:

  • https://discourse.mc-stan.org/t/does-the-init-argument-work-with-cmdstanr/17255/19
  • https://github.com/stan-dev/cmdstan/issues/918

It sounds like this is actually an io/services issue because the contents of a stan::io::var_context object would need to be dumped to text. Is that right?

I don't know how difficult this would be, but to me providing access to this information seems pretty important.

The most important information to convey to users I think would be inits for parameters on the constrained scale. Transformed parameters, generated quantities, and the unconstrained scale, are potentially useful but not as high of a priority.

jgabry avatar Aug 08 '20 21:08 jgabry

We need to answer at least the following before building this:

  • Is this a new service argument callback or do we piggyback on the iteration writer using iteration = 0 or something like that? If the latter, what's the effect on existing interfaces? If the former, what's the callback structure (probably just like the iteration writer, but it should be stated)?

  • Do we output transformed parameters and generated quantities as part of the initialization? They're not given by the var_context, but by running write_array. I'm not sure if that happens as is, because we don't have any need for the transformed parameters or generated quantities of the initial state now. If we do add it, it's going to cause slightly different output than last iteration because the RNG will get advanced.

  • Is the output on the constrained or unconstrained scale? Presumably the former, but the feature request needs to make this clear.

I would suggest grabbing the values after the var_context is used to extract the values and the results fed through write_array. We can't read from the var_context twice because it can advance the RNG and we don't have an easy and generic way to set it back.

bob-carpenter avatar Aug 09 '20 19:08 bob-carpenter

We need to answer at least the following before building this:

  • Is this a new service argument callback or do we piggyback on the iteration writer using iteration = 0 or something like that? If the latter, what's the effect on existing interfaces? If the former, what's the callback structure (probably just like the iteration writer, but it should be stated)?

Good question. I don't know enough of the current implementation details.

  • Do we output transformed parameters and generated quantities as part of the initialization? They're not given by the var_context, but by running write_array. I'm not sure if that happens as is, because we don't have any need for the transformed parameters or generated quantities of the initial state now. If we do add it, it's going to cause slightly different output than last iteration because the RNG will get advanced.

If we have to just do the parameters to avoid RNG issues then I think that's better than nothing. The parameters are of greatest interest here anyway.

  • Is the output on the constrained or unconstrained scale? Presumably the former, but the feature request needs to make this clear.

Good point. Yes, constrained scale. I update the initial post.

jgabry avatar Aug 10 '20 18:08 jgabry

For new callback vs. using existing one, you could trace down how RStan uses the iterations.

The RNG issue just means we can't call the var_context to print and then call it again for sampling because the RNG state won't match. Similarly, we can generate transformed parameters and generated quantities as long as we only do it once. Those only come out on the constrained scale.

bob-carpenter avatar Aug 10 '20 18:08 bob-carpenter

I would suggest grabbing the values after the var_context is used to extract the values and the results fed through write_array. We can't read from the var_context twice because it can advance the RNG and we don't have an easy and generic way to set it back.

this is doable given the instantiated model and the var_context for the initial parameters - it would be very similar to how the standalone generated quantities method works.

mitzimorris avatar Aug 10 '20 21:08 mitzimorris

Sorry about the late reply. There's actually an init_writer() that should have the initial values written out to it.

https://github.com/stan-dev/stan/blob/develop/src/stan/services/sample/hmc_nuts_diag_e_adapt.hpp#L51 https://github.com/stan-dev/stan/blob/develop/src/stan/services/sample/hmc_nuts_diag_e_adapt.hpp#L65

It's on the unconstrained scale. There are convenience functions to get it back to the constrained scale.

To get to the original post, @jgabry, I think it's a matter of writing it out... it's already there.

syclik avatar Aug 18 '20 19:08 syclik

in CmdStan command.hpp this isn't hooked up to anything. https://github.com/stan-dev/cmdstan/blob/fd7d8faf88a791d078500cc5fd7458a8d68f7b17/src/cmdstan/command.hpp#L154

mitzimorris avatar Aug 18 '20 20:08 mitzimorris

That's right. It just needs to be stored somewhere.

As a prototype, if you replaced that writer with one that writes out to std::cout, you should see the initial values. I'll try to bring up some code to show that.

syclik avatar Aug 18 '20 21:08 syclik

I understand that it needs to be stored somewhere. the question is where and how to label it so that it's easily interpretable.

mitzimorris avatar Aug 19 '20 00:08 mitzimorris

If you replace that line linked (command.hpp L154) with this one, it'll print out right before the first iteration:

  stan::callbacks::stream_writer init_writer(std::cout);

It looks something like this...

...

Gradient evaluation took 1e-05 seconds
1000 transitions using 10 leapfrog steps per transition would take 0.1 seconds.
Adjust your expectations accordingly!


0.277565
Iteration:    1 / 2000 [  0%]  (Warmup)

where 0.277565 is the initial value. (if you ran it with init=0, then you'll see that it's exactly 0.)

I understand that it needs to be stored somewhere. the question is where and how to label it so that it's easily interpretable.

Got it. That, I have no good solution... we could put it in as the first line of the CSV, but that would wreak havoc with an off-by-one error everywhere. We could add it as a comment, but that's not super useful. We could write it to a different file...

These all seem non-optimal.

syclik avatar Aug 19 '20 00:08 syclik

These all seem non-optimal.

exactly. which is why Aki's hack is needed (for now) - documented it in the Stan User's Guide, section 9.5.4 - https://mc-stan.org/docs/2_24/cmdstan-guide/mcmc-config.html#initializing-parameters

mitzimorris avatar Aug 19 '20 03:08 mitzimorris

Thanks for the additional info. I agree none of these are optimal, but writing it to a different file seems like the least bad of the non-optimal solutions ;)

jgabry avatar Aug 21 '20 17:08 jgabry