Feature/1029 record fixed param
Submisison Checklist
- [x] Run tests:
./runCmdStanTests.py src/test - [x] Declare copyright holder and open-source license: see below
Summary:
In command.hpp, before invoking service method fixed_param, add line "# fixed_param=1" to Stan CSV header.
Intended Effect:
This will make it easier for wrapper interfaces to check Stan CSV files. see https://github.com/stan-dev/cmdstan/issues/1029
How to Verify:
Added more unit tests
Side Effects:
CSV header files will change only when method fixed_param is invoked. Should be harmless.
Documentation:
n/a
Copyright and Licensing
Please list the copyright holder for the work you are submitting (this will be you or your assignee, such as a university or company): Columbia University
By submitting this pull request, the copyright holder is agreeing to license the submitted work under the following licenses:
- Code: BSD 3-clause (https://opensource.org/licenses/BSD-3-Clause)
- Documentation: CC-BY 4.0 (https://creativecommons.org/licenses/by/4.0/)
| Name | Old Result | New Result | Ratio | Performance change( 1 - new / old ) |
|---|---|---|---|---|
| gp_pois_regr/gp_pois_regr.stan | 3.05 | 3.19 | 0.96 | -4.58% slower |
| low_dim_corr_gauss/low_dim_corr_gauss.stan | 0.02 | 0.02 | 0.98 | -2.49% slower |
| eight_schools/eight_schools.stan | 0.11 | 0.1 | 1.03 | 2.79% faster |
| gp_regr/gp_regr.stan | 0.16 | 0.16 | 0.96 | -4.67% slower |
| irt_2pl/irt_2pl.stan | 5.95 | 6.08 | 0.98 | -2.3% slower |
| performance.compilation | 89.76 | 87.14 | 1.03 | 2.91% faster |
| low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan | 8.55 | 8.85 | 0.97 | -3.61% slower |
| pkpd/one_comp_mm_elim_abs.stan | 29.95 | 29.84 | 1.0 | 0.37% faster |
| sir/sir.stan | 127.79 | 123.41 | 1.04 | 3.43% faster |
| gp_regr/gen_gp_data.stan | 0.03 | 0.04 | 0.97 | -3.32% slower |
| low_dim_gauss_mix/low_dim_gauss_mix.stan | 3.02 | 3.08 | 0.98 | -2.21% slower |
| pkpd/sim_one_comp_mm_elim_abs.stan | 0.42 | 0.38 | 1.08 | 7.37% faster |
| arK/arK.stan | 1.87 | 1.87 | 1.0 | -0.12% slower |
| arma/arma.stan | 0.83 | 0.76 | 1.1 | 8.71% faster |
| garch/garch.stan | 0.53 | 0.67 | 0.79 | -26.63% slower |
| Mean result: 0.98917823181 |
Jenkins Console Log Blue Ocean Commit hash: ed2f282e9a811b3ac414d49c6b70e187d5625567
Machine information
ProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010CPU: Intel(R) Xeon(R) CPU E5-1680 v2 @ 3.00GHz
G++: Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1 Apple LLVM version 7.0.2 (clang-700.1.81) Target: x86_64-apple-darwin15.6.0 Thread model: posix
Clang: Apple LLVM version 7.0.2 (clang-700.1.81) Target: x86_64-apple-darwin15.6.0 Thread model: posix
I couldn't agree more. I looked at the classes in src/cmdstan/argument - the code is a bit opaque.
are these object mutable? not sure how to change the algorithm argument - just create a new one?
_value seems to be protected and value() just returns back a copy of _value. One sort of weird way you could do this is to go through and manually call delete on items you don't need for fixed_param from valid_arguments. If you delete them and set the pointer of each to nullptr that should be safe.
OK, will look into messing with valid_arguments. since we've got a fix for this in CmdStanPy, not a huge priority. just another example of why we really need to change the way that CmdStan handles command line argments and config.
The problem here isn't with the CmdStan argument parser but rather with that recently introduce feature of trying to modify the CmdStan configuration based on the given Stan program.
The entire CmdStan ecosystem was designed around the arguments being immutable. A user specifies the configuration and CmdStan executes exactly that configuration no matter the structure of the Stan program. The behavior of CmdStan is then entirely determined how it is called without the need to consult the associated Stan program, which is critical to reproducible analysis pipelines where different Stan programs are switched in and out. In other words the immutability of the configuration ensures no surprises in what is executed.
Modifying the configuration based on the structure of the Stan program breaks this guarantee. Instead of returning an error that immediately communicates a conflict between the requested configuration and the given Stan program the assumption of a misspecified configuration propagates through the entire analysis pipeline. If that assumption is incorrect then it would not be observed until later on in the pipeline complicating debugging.
The reason why the argument parser does not have readily-accessible mutators is that it was designed for this immutability paradigm. By breaking the immutability you've gone beyond the intended use of the code. That said while mutators are not implemented the modification of the parsed argument is straightforward given that the argument parsing is implemented recursively.
// Grab the algorithm argument node from the argument tree and
// cast it to a `list_argument` to ensure that the proper methods are called
list_argument* algo = dynamic_cast<list_argument*>(
parser.arg("method")->arg("sample")->arg("algorithm"));
// Create a vector of strings with the new algorithm arguments
std::vector<std::string> modified_algo_args;
modified_algo_args.push_back(“fixed_param”);
// Update the algorithm node to use the new arguments
algo.parse_args(modified_algo_args, info, err, false);
Again I don't actually think that this fix should be implemented but rather the automatic modification reverted.
I agree with @betanalpha that it seems confusing to change what users requested to something else because what they requested is incoherent.
How about just generalizing the HMC algorithms to allow zero-size parameter vectors? If you look at the leapfrog algorithm with size zero vectors, the boundary condition is a no-op, which is just what we want. Then users can choose any of our HMC algorithms when there is zero-size parameters and the boundary condition should do the right thing. Then we don't need to modify any of our output.
The reason why I introduced a separate “fixed_param” sampler configuration is that while the Hamiltonian trajectories themselves are fine the local (trajectory length) and global (stepsize, inverse metric) adaptation are not. Instead of trying to adapt each of those routines it was easier to have a separate, explicit option for the user to communicate when they don’t want to modify the parameter space at all.
On Sep 21, 2021, at 12:43 PM, Bob Carpenter @.***> wrote:
I agree with @betanalpha https://github.com/betanalpha that it seems confusing to change what users requested to something else because what they requested is incoherent.
How about just generalizing the HMC algorithms to allow zero-size parameter vectors? If you look at the leapfrog algorithm with size zero vectors, the boundary condition is a no-op, which is just what we want. Then users can choose any of our HMC algorithms when there is zero-size parameters and the boundary condition should do the right thing. Then we don't need to modify any of our output.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/stan-dev/cmdstan/pull/1030#issuecomment-924164895, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALU3FSZMSID6ELYLAGG2NTUDCY3ZANCNFSM5BXW4HEQ. Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
it seems confusing to change what users requested to something else because what they requested is incoherent.
the user's request was vague, but then defaults were applied, resulting in an incoherent request. it's a problem with the defaults, not the user. all the user wants is to run the program to generate a sample.
the user's request was vague, but then defaults were applied, resulting in an incoherent request
That's a really good point. If a user only specifies "sample", then I don't see the problem with choosing fixed_param or adaptive, diagonal, Euclidean NUTS for them depending on whether the model has 0 or 1+ parameters.
That's a really good point. If a user only specifies "sample", then I don't see the problem with choosing fixed_param or adaptive, diagonal, Euclidean NUTS for them depending on whether the model has 0 or 1+ parameters.
This is a misinterpretation of what “sample” means.
All of the method arguments refer how the parameter space is handled. method=sample refers to sampling from the parameters defined in a Stan program and nowhere else.
The fixed_param algorithm runs a trivial Markov chain that leaves the initial parameter values unperturbed. A consequence of this is that running with the fixed_param algorithm just executes the transformed parameters and generated quantities block on the initial parameters over and over again, regardless of the configuration of the parameter space or the code in those blocks. If the generated quantities block happens to have RNG evaluations then this might look like a sampling operation but that’s irrelevant to why method=sample and algorithm=fixed_param were chosen. And if the generated quantities block doesn’t have any RNG evaluations then nothing looks like sampling at all despite method=sample and algorithm=fixed_param still being meaningful.
More detailed warning messages to users clarifying this behavior is awesome. Adding a new method that more directly runs generated quantities without pretending to run a Markov chain, with the additional benefit of being able to craft proper arguments, is also awesome. Trying to change the behavior of method=sample based on overlapping terminology that might arise in some Stan programs is, in my opinion, much more confusing and much less awesome.
This is a misinterpretation of what “sample” means.
I agree that fixed-param doesn't do MCMC sampling, but it's main utility is for MC sampling using the _rng functions. Not to go all Humpty-Dumpty here, but we can define the keyword "sample" in an interface to mean whatever we want it to mean as long as it's consistent with standard meanings of the word.
I can see two reasons to run fixed-param.
- to MC sample using pseudo-RNGS, or
- to evaluate a Stan function or test a data/transformed data block.
(2) feels like it should be done another way. Are there other reasons people run fixed_param?
What I tend to do in practice is just call the defaults. Then if I have zero size parameters, I have to fumble in the doc to figure out how to call a different function. And if I ever want to use a single program for both, I have to write a conditional into the interface scripts. But I agree that it's awkward to be able to supply HMC parameters if HMC isn't being run. If we enforce that consistency by throwing errors whenever we call fixed_param with HMC parameters, I'd still have to write conditionals into the interface scripts.
I agree that fixed-param doesn't do MCMC sampling, but it's main utility is for MC sampling using the _rng functions. Not to go all Humpty-Dumpty here, but we can define the keyword "sample" in an interface to mean whatever we want it to mean as long as it's consistent with standard meanings of the word.
Sure, but we don’t want define it in ways that clash with what is actually being done or even possible.
For example there’s no general way to turn a Stan program that species a joint density over the parameters and data variables into a “procedurally generative” program that can generate example samples of all of those variables, so there’s no “sample” or “Monte Carlo” method that would ever a valid option for general Stan programs.
fixed_param describes what is done - fixed the parameters but letting everything else be evaluated -- not what one can do with the side effects of that action, such as evaluating the data or transformed data block once or running the generated quantities block over and over again.
I can see two reasons to run fixed-param.
to MC sample using pseudo-RNGS, or to evaluate a Stan function or test a data/transformed data block. I think it’s more to think about how a Stan program is actually processed, in other words which blocks are actually evaluated.
fixed_paramevaluates everything at the same initial values of the parameters block which then allows for all kinds of useful side effects based on the structure of those blocks. At the same time because those side effects can’t be guaranteed unless a Stan program has certain structure in certain blocks there’s no way to guarantee them and hence they’re not really appropriate naming targets. What I tend to do in practice is just call the defaults. Then if I have zero size parameters, I have to fumble in the doc to figure out how to call a different function. And if I ever want to use a single program for both, I have to write a conditional into the interface scripts. But I agree that it's awkward to be able to supply HMC parameters if HMC isn't being run. If we enforce that consistency by throwing errors whenever we call fixed_param with HMC parameters, I'd still have to write conditionals into the interface scripts
I don’t see the problem with having to call the right method based on what you want to do.
method=sample algorithm=hmc will attempt to evolve the parameters with Markov chain Monte Carlo. That’s it. If your program does not provide enough information for that to be a well-defined operation then it should err our, which is used to do.
method=sample algorithm=fixed_param uses a trivial Markov chain Monte Carlo algorithm that doesn’t achieve the same behavior as the Hamiltonian Monte Carlo sampler. There are lots of useful side effects for this, and if you want one of those side effects then you should specify that directly with the call. You can’t rely on default behavior when there are multiple, distinct actions you want to perform.
Again I think having a full method that repeatedly evaluates the Stan program at the fixed input, and not hiding that in method=sample would be a great way to clarify the intent and possible applications. But this issue is all about the CmdStan interface automatically switching from algorithm=hmc to algoirthm=fixed_param based on an exceptional circumstance for certain Stan programs.
@betanalpha
The problem here isn't with the CmdStan argument parser but rather with that recently introduce feature of trying to modify the CmdStan configuration based on the given Stan program.
I don't actually think that this fix should be implemented but rather the automatic modification reverted.
@bob-carpenter
How about just generalizing the HMC algorithms to allow zero-size parameter vectors? If you look at the leapfrog algorithm with size zero vectors, the boundary condition is a no-op, which is just what we want. Then users can choose any of our HMC algorithms when there is zero-size parameters and the boundary condition should do the right thing. Then we don't need to modify any of our output.
@betanalpha
while the Hamiltonian trajectories themselves are fine the local (trajectory length) and global (stepsize, inverse metric) adaptation are not.
In theory, HMC works perfectly fine in a zero-dimensional space. It's just arithmetic with size-zero vectors, trivial but well-behaved. In practice, Stan's HMC algorithm works perfectly fine in a zero-dimensional space but there are two other bugs that interfere.
First, disabling the zero-parameter check in CmdStan and running with defaults result in
Exception initializing step size.
Posterior is improper. Please check your model.
which is obviously false: the posterior is a point mass and necessarily proper. It's just that step size is meaningless and the step size adapter misdiagnoses this.
But precisely because step size is meaningless adaptation shouldn't do anything anyway. Running with adapt engaged=0 warmup completes successfully but the program crashes when it tries to print the metric.
stan/lib/stan_math/lib/eigen_3.3.9/Eigen/src/Core/DenseCoeffsBase.h:425: Eigen::DenseCoeffsBase<type-parameter-0-0, 1>::Scalar &Eigen::DenseCoeffsBase<Eigen::Matrix<double, -1, 1, 0, -1, 1>, 1>::operator()(Eigen::Index) [Derived = Eigen::Matrix<double, -1, 1, 0, -1, 1>, Level = 1]: Assertion `index >= 0 && index < size()' failed.
Ok, one-line fix in stan/mcmc/hmc/hamiltonians/diag_e_point.hpp and with that it runs to completion.
Mean MCSE StdDev 5% 50% 95% N_Eff N_Eff/s R_hat
lp__ 0.00 nan 0.00 0.00 0.0e+00 0.00 nan nan nan
accept_stat__ 1.0 nan 6.7e-16 1.0 1.0 1.0 nan nan nan
stepsize__ 1.0 nan 6.7e-16 1.0 1.0 1.0 nan nan nan
treedepth__ 1.0 nan 6.7e-16 1.0 1.0 1.0 nan nan nan
n_leapfrog__ 1.0 nan 6.7e-16 1.0 1.0 1.0 nan nan nan
divergent__ 0.00 nan 0.0e+00 0.00 0.00 0.00 nan nan nan
energy__ 0.00 nan 0.0e+00 0.00 0.00 0.00 nan nan nan
All trajectories terminate after one proposal and always accept.
You could generate the same samples a bit more efficiently with fixed_param sampler (which evaluates zero proposals for each iteration) but--and here's the problem this PR is trying to workaround--fixed_param has a different output format from all the other samplers.
Why does fixed_param have different output? As far as I can tell it's because fixed_param has no warmup and thus omits the adaptation info. On one hand that makes sense, "warming up" fixed_param would be just wasted computation; on the other hand fixed_param still supports thinning, which is equally pointless, and CmdStan accepts (but silently ignores) nonzero num_warmup argument even with algorithm=fixed_param.
In short, it's all inconsistent.
In theory, HMC works perfectly fine in a zero-dimensional space. It's just arithmetic with size-zero vectors, trivial but well-behaved. In practice, Stan's HMC algorithm works perfectly fine in a zero-dimensional space but there are two other bugs that interfere.
First, disabling the zero-parameter check in CmdStan and running with defaults result in
Exception initializing step size. Posterior is improper. Please check your model. which is obviously false: the posterior is a point mass and necessarily proper.
If we’re being formal then a zero-dimensional space is topologically discrete and cannot be written as a dense subset of the real numbers. The entire algorithm library, from the algorithms to the output to the summaries, assumes that the ambient space is a subset of the real numbers (which is why Stan can’t directly handle spheres, torii, and other topologically nontrivial spaces even though Hamiltonian Monte Carlo itself is applicable). From that perspective the error message is correct — the posterior density function relative to a reference Lebesgue measure is not proper.
At the same time even if we equate “no parameters” with “zero-dimensional” the ambient space need not be a point. A single point is a classic model for a zero-dimensional space, but any discrete set of points is technically valid too.
It’s just that step size is meaningless and the step size adapter misdiagnoses this. But precisely because step size is meaningless adaptation shouldn't do anything anyway. Running with adapt engaged=0 warmup completes successfully but the program crashes when it tries to print the metric.
stan/lib/stan_math/lib/eigen_3.3.9/Eigen/src/Core/DenseCoeffsBase.h:425: Eigen::DenseCoeffsBase<type-parameter-0-0, 1>::Scalar &Eigen::DenseCoeffsBase<Eigen::Matrix<double, -1, 1, 0, -1, 1>, 1>::operator()(Eigen::Index) [Derived = Eigen::Matrix<double, -1, 1, 0, -1, 1>, Level = 1]: Assertion `index >= 0 && index < size()' failed. The adaption code, and again the I/O code, assume a continuous ambient space that is a subset of some finite-dimensional real space.
Whether or not all of the code can be adapted to an empty parameter block depends on the particular conventions assumed for the corresponding ambient space. If a point space is assumed then no-ops are valid, but again that’s not the only possible assumption. A useful discussion but tangential to the main issue in the thread.
You could generate the same samples a bit more efficiently with fixed_param sampler (which evaluates zero proposals for each iteration) but--and here's the problem this PR is trying to workaround--fixed_param has a different output format from all the other samplers.
That’s not the original issue. The originating problem is that the Stan configuration is printed to output based on the input configuration. CmdStanPy is currently modifying that configuration which then results in a conflict between what configuration is executed, and hence the structure of the output, and what configuration is reported. In particular the header of the CmdStan output reports that HMC is run while the body is given by fixed_param output.
Why does fixed_param have different output? As far as I can tell it's because fixed_param has no warmup and thus omits the adaptation info. On one hand that makes sense, "warming up" fixed_param would be just wasted computation; on the other hand fixed_param still supports thinning, which is equally pointless, and CmdStan accepts (but silently ignores) nonzero num_warmup argument even with algorithm=fixed_param. In short, it's all inconsistent.
The output is consistent. Running different HMC sampler configurations results in different adaptation information being output, and the fixed_param configuration simply has no adaptation information to output. Warmup is an implicit no-op by default because generated quantities isn’t run if save_warmup=0 and so there’s nothing to output, but if save_warmup=1 then generated quantities are run and those evaluations are output.
The problem that’s arising is when other programs is run over the CmdStan output after the fact. Those programs — especially the code being print/stansummary — check the Stan configuration to determine what output to expect. When the output Stan configuration specifies that HMC was run but the output doesn’t contain the assumed HMC output then there are problems.
Now those CmdStan programs that read the CmdStan output are very much fragile and poorly implemented, a sin which goes back a very long time, and more robust processing logic would be welcome. But that’s a tangential issue here.
If we’re being formal then a zero-dimensional space is topologically discrete and cannot be written as a dense subset of the real numbers.
Right. R^0 has just a single element, the empty tuple and any measure has to assign it 100% of the probability mass. But it's still a proper measure and I can't see where the rest of the math won't generalize properly if we get our code right.
When the output Stan configuration specifies that HMC was run but the output doesn’t contain the assumed HMC output then there are problems.
That feels like a bug to me in the brittle way we pack things into and read things out of .csv format.
Running with adapt engaged=0 warmup completes successfully but the program crashes when it tries to print the metric.
This also just feels like a bug in our code not handling N = 0 boundary conditions.
But isn't this moot for the issue at hand, which is just to let the sampling method deal with size zero output by allowing the flexibility to select between HMC or fixed-param depending on the problem dimensionality?
If we’re being formal then a zero-dimensional space is topologically discrete and cannot be written as a dense subset of the real numbers.
Right. R^0 has just a single element, the empty tuple and any measure has to assign it 100% of the probability mass. But it's still a proper measure and I can't see where the rest of the math won't generalize properly if we get our code right.
This also just feels like a bug in our code not handling N = 0 boundary conditions. But isn't this moot for the issue at hand, which is just to let the sampling method deal with size zero output by allowing the flexibility to select between HMC or fixed-param depending on the problem dimensionality?
Again: a zero-dimensional space does not imply a single element set! There are many zero-dimensional spaces and hence many different ways to extrapolate to the N = 0 boundary case of R^N. These include the empty set, a set with a single element, and a set with a finite number of elements.
Each of these zero-dimensional spaces imply different Markov chain Monte Carlo algorithms and output configurations. For example the empty set would correspond to no-ops everywhere with no parameter values passed to the MCMC output. A single element set would also correspond to no-op but the value of that one element would technically get passed to the MCMC output. A no-op sampler would definitely not be okay for a discrete set.
The zero-dimensional case is ambiguous and there is not an obvious or canonical implementation of Markov chain Monte Carlo. We can choose one arbitrarily or we can just not define HMC for this ambiguous setting, instead erring out with an informative message as was done previously. I strongly prefer the latter. If a user wants a no-op sampler then they should specify a no-op sampler.
When the output Stan configuration specifies that HMC was run but the output doesn’t contain the assumed HMC output then there are problems.
That feels like a bug to me in the brittle way we pack things into and read things out of .csv format.
It’s not bug in that it’s technically working as intended. Every well-defined HMC output has a guaranteed structure and the reader assumes that structure.
I agree that the CmdStan csv reader is brittle and should be updated or redesigned, but I don’t think this is the reason why.
There are many zero-dimensional spaces and hence many different ways to extrapolate to the N = 0 boundary case of R^N.
There are indeed many zero dimensional spaces (although, no, the empty set is not one of them) but R^N does not mean some random N-dimensional space. R^N means the N-dimensional real vector space. That is unique for every N; zero is not special.
The issue isn't about generalizing HMC to arbitrary zero dimensional spaces. The only space of interest is the one Stan chooses when a model has no parameters at all. And that particular space is quite well-behaved.
There are indeed many zero dimensional spaces (although, no, the empty set is not one of them) but R^N does not mean some random N-dimensional space. R^N means the N-dimensional real vector space. That is unique for every N; zero is not special.
By empty set I was referring to the model theoretic use of the empty set itself as the canonical element of a set with cardinality one, but that’s irrelevant.
R^N does not uniquely define an N-dimensional real vector space. In order to identify R^N with a Euclidean vector space one has to fix a parameterization/coordinate-system; in particular non-linear transformations/reparameterizations/changes or coordinate systems change the identification of R^N with the Euclidean vector space. One can interpret this as an infinite number of different R^N, each identified with a canonical Euclidean vector space, or a single R^N that requires a choice of parameterization before it can be identified with a Euclidean vector space, but either way R^N doesn’t refer to unique, full-specified system.
The identification ambiguity carries over to the zero-dimensional case as well. For example let’s say that we define a vector space containing only the zero vector as a zero-dimensional space. There are an infinite number of ways to associate the zero vector with a single point, each corresponding to a different choice of origin in R^1 with the different choices related by translations.
Taking any choice arbitrarily yields has a non-empty state space, which formally means that we need a name for a variable that takes values in the state space so that we can print the constant output. This does not correspond to the no-op, no-output behavior of the fixed_param sampler.
no longer relevant.