AbstractMCMC.jl
AbstractMCMC.jl copied to clipboard
Unhelpfull docstring for MCMCThreads
This is the current helpstring for MCMCThreads
help?> MCMCThreads
search: MCMCThreads
MCMCThreads
The MCMCThreads algorithm allows users to sample MCMC chains in parallel using
multiple threads.
It would be extremely helpful if there was a method signature demonstrating how to use it. However, that functionality lives in Turing.jl
, so I am not sure if adding that signature here even makes sense.
I would like it if something like the following could be added to the docstring:
To sample using multiple threads with `Turing.jl`, use Turing's `sample` function with the following signature:
sample(model, sampler, parallel_type, n, n_chains)
, where `parallel_type` is `MCMCThreads()`. `parallel_type ` could
also be `MCMCDistributed()` for distributed sampling.
If this is to be done, it should be a coherent effort across similar functions such as MCMCDistributed()
.
The docstring of sample
(which BTW is not Turing-specific or owned by Turing) explains how it is used: https://github.com/TuringLang/AbstractMCMC.jl/blob/master/src/sample.jl#L71 Maybe we could just add a link to sample
in the docstring (ie. add a See also [`sample`](@ref)
)?
Generally, the interface is described in much more detail in the online documentation of AbstractMCMC since there are many parts relevant for different functions and types, and to keep the size of the docstrings somewhat maneagable.
Yhea, the documentation in the docstring for sample
is also not great, as you need to scroll through this wall of test to get to anything about MCMCThreads
:
Current docstring for `sample`
help?> Turing.sample
sample([rng], a, [wv::AbstractWeights])
Select a single random element of a. Sampling probabilities are proportional to the weights given
in wv, if provided.
Optionally specify a random number generator rng as the first argument (defaults to
Random.GLOBAL_RNG).
──────────────────────────────────────────────────────────────────────────────────────────────────
sample([rng], a, [wv::AbstractWeights], n::Integer; replace=true, ordered=false)
Select a random, optionally weighted sample of size n from an array a using a polyalgorithm.
Sampling probabilities are proportional to the weights given in wv, if provided. replace dictates
whether sampling is performed with replacement. ordered dictates whether an ordered sample (also
called a sequential sample, i.e. a sample where items appear in the same order as in a) should be
taken.
Optionally specify a random number generator rng as the first argument (defaults to
Random.GLOBAL_RNG).
──────────────────────────────────────────────────────────────────────────────────────────────────
sample([rng], a, [wv::AbstractWeights], dims::Dims; replace=true, ordered=false)
Select a random, optionally weighted sample from an array a specifying the dimensions dims of the
output array. Sampling probabilities are proportional to the weights given in wv, if provided.
replace dictates whether sampling is performed with replacement. ordered dictates whether an
ordered sample (also called a sequential sample, i.e. a sample where items appear in the same
order as in a) should be taken.
Optionally specify a random number generator rng as the first argument (defaults to
Random.GLOBAL_RNG).
──────────────────────────────────────────────────────────────────────────────────────────────────
sample([rng], wv::AbstractWeights)
Select a single random integer in 1:length(wv) with probabilities proportional to the weights
given in wv.
Optionally specify a random number generator rng as the first argument (defaults to
Random.GLOBAL_RNG).
──────────────────────────────────────────────────────────────────────────────────────────────────
sample([rng, ]model, sampler, N; kwargs...)
Return N samples from the model with the Markov chain Monte Carlo sampler.
──────────────────────────────────────────────────────────────────────────────────────────────────
sample([rng, ]model, sampler, isdone; kwargs...)
Sample from the model with the Markov chain Monte Carlo sampler until a convergence criterion
isdone returns true, and return the samples.
The function isdone has the signature
isdone(rng, model, sampler, samples, state, iteration; kwargs...)
where state and iteration are the current state and iteration of the sampler, respectively. It
should return true when sampling should end, and false otherwise.
──────────────────────────────────────────────────────────────────────────────────────────────────
sample([rng, ]model, sampler, parallel, N, nchains; kwargs...)
Sample nchains Monte Carlo Markov chains from the model with the sampler in parallel using the
parallel algorithm, and combine them into a single chain.
──────────────────────────────────────────────────────────────────────────────────────────────────
sample([rng,] chn::Chains, [wv::AbstractWeights,] n; replace=true, ordered=false)
Sample n samples from the pooled (!) chain chn.
The keyword arguments replace and ordered determine whether sampling is performed with replacement
and whether the sample is ordered, respectively. If specified, sampling probabilities are
proportional to weights wv.
│ Note
│
│ If chn contains multiple chains, they are pooled (i.e., appended) before sampling. This
│ ensures that even in this case exactly n samples are returned:
│
│ julia> chn = Chains(randn(11, 4, 3));
│
│ julia> size(sample(chn, 7)) == (7, 4, 1)
│ true
──────────────────────────────────────────────────────────────────────────────────────────────────
sample(
rng::AbstractRNG,
h::Hamiltonian,
κ::AbstractMCMCKernel,
θ::AbstractVecOrMat{T},
n_samples::Int,
adaptor::AbstractAdaptor=NoAdaptation(),
n_adapts::Int=min(div(n_samples, 10), 1_000);
drop_warmup::Bool=false,
verbose::Bool=true,
progress::Bool=false
)
Sample n_samples samples using the proposal κ under Hamiltonian h.
• The randomness is controlled by rng.
• If rng is not provided, GLOBAL_RNG will be used.
• The initial point is given by θ.
• The adaptor is set by adaptor, for which the default is no adaptation.
• It will perform n_adapts steps of adaptation, for which the default is the
minimum of 1_000 and 10% of n_samples
• drop_warmup controls to drop the samples during adaptation phase or not
• verbose controls the verbosity
• progress controls whether to show the progress meter or not
──────────────────────────────────────────────────────────────────────────────────────────────────
sample(model::AdvancedHMC.DifferentiableDensityModel, kernel::AdvancedHMC.AbstractMCMCKernel, metric::AdvancedHMC.AbstractMetric, adaptor::AdvancedHMC.Adaptation.AbstractAdaptor, N::Integer; kwargs...) -> Any
A convenient wrapper around AbstractMCMC.sample avoiding explicit construction of HMCSampler.
Only the second last entry explains parallel sampling, with the short sentence
sample([rng, ]model, sampler, parallel, N, nchains; kwargs...)
Sample nchains Monte Carlo Markov chains from the model with the sampler in parallel using the
parallel algorithm, and combine them into a single chain.
It does not mention MCMCThreads
, so when I am visually scanning for "threads" I do not find anything. Also, the signatures fail to mention the types of anything. This means that reading such a signature requires the user to more or less know what usually goes where - especially as some of the arguments are custom types of the package.
So linking to sample
would be a good part of the problem - the other would be to make the docstring for sample
more readable. But there are so many signatures that they need some cleaning. I generally prefer something like the following:
Proposed docstring for `sample`
help?> Turing.sample
Sampling from an array:
sample(a, [wv::AbstractWeights])
sample(a, [wv::AbstractWeights], n::Integer; replace=true, ordered=false)
sample(a, [wv::AbstractWeights], dims::Dims; replace=true, ordered=false)
Select a single random element of a. Sampling probabilities are proportional to the weights given
in wv, if provided.
`n` determines the sample size.
`replace` dictates whether sampling is performed with replacement.
`ordered` dictates whether an ordered sample (also called a sequential sample,
i.e. a sample where items appear in the same order as in a) should be taken.
`dims` specifies the dimensions of the output array.
Omitting `a` is equivalent to setting `a` equal to Base.OneTo(length(wv))
──────────────────────────────────────────────────────────────────────────────────────────────────
Sampling from a model:
sample(model, sampler, N; kwargs...)
sample(model, sampler, isdone; kwargs...)
sample(model, sampler, parallel, N, nchains; kwargs...)
Sample `N` samples based on the model `model`, with the Markov chain Monte Carlo sampler.
If convergence criterion `isdone` is supplied, sample until it returns true.
If `parallel` and `nchains` are provided, sample `nchains` separate chains. This is done in parallel
if `parallel` is `MCMCThreads()`, and in a distributed manner if `parallel` is `MCMCDistributed()`
The function `isdone` should have a signature
`isdone(rng, model, sampler, samples, state, iteration; kwargs...)`
where state and iteration are the current state and iteration of the sampler, respectively. It
should return true when sampling should end, and false otherwise.
For all signatures, optionally specify a random seed as the first argument (defaults to
Random.GLOBAL_RNG).
──────────────────────────────────────────────────────────────────────────────────────────────────
sample([rng,] chn::Chains, [wv::AbstractWeights,] n; replace=true, ordered=false)
Sample n samples from the pooled (!) chain chn.
The keyword arguments replace and ordered determine whether sampling is performed with replacement
and whether the sample is ordered, respectively. If specified, sampling probabilities are
proportional to weights wv.
│ Note
│
│ If chn contains multiple chains, they are pooled (i.e., appended) before sampling. This
│ ensures that even in this case exactly n samples are returned:
│
│ julia> chn = Chains(randn(11, 4, 3));
│
│ julia> size(sample(chn, 7)) == (7, 4, 1)
│ true
──────────────────────────────────────────────────────────────────────────────────────────────────
sample(
rng::AbstractRNG,
h::Hamiltonian,
κ::AbstractMCMCKernel,
θ::AbstractVecOrMat{T},
n_samples::Int,
adaptor::AbstractAdaptor=NoAdaptation(),
n_adapts::Int=min(div(n_samples, 10), 1_000);
drop_warmup::Bool=false,
verbose::Bool=true,
progress::Bool=false
)
Sample n_samples samples using the proposal κ under Hamiltonian h.
• The randomness is controlled by rng.
• If rng is not provided, GLOBAL_RNG will be used.
• The initial point is given by θ.
• The adaptor is set by adaptor, for which the default is no adaptation.
• It will perform n_adapts steps of adaptation, for which the default is the
minimum of 1_000 and 10% of n_samples
• drop_warmup controls to drop the samples during adaptation phase or not
• verbose controls the verbosity
• progress controls whether to show the progress meter or not
──────────────────────────────────────────────────────────────────────────────────────────────────
sample(model::AdvancedHMC.DifferentiableDensityModel, kernel::AdvancedHMC.AbstractMCMCKernel, metric::AdvancedHMC.AbstractMetric, adaptor::AdvancedHMC.Adaptation.AbstractAdaptor, N::Integer; kwargs...) -> Any
A convenient wrapper around AbstractMCMC.sample avoiding explicit construction of HMCSampler.
Yes, probably it would be useful to add the signatures to the docstrings. We could use DocStringExtensions.jl to ensure that they are correct and simplify updates. Would you like to make a PR?
Unfortunately, the REPL output for ?sample
will always be messy because it's neither owned by AbstractMCMC nor Turing but implemented and documented by many different packages. Also downstream packages such as Turing or AdvancedHMC might want to add additional docstrings for sample
with their samplers if they allow custom keyword arguments.
You can obtain only the docstring of interest by searching more specifically for a certain function calls or signatures instead of the generic sample
. For that reason usually I think it's better to not always merge docstrings (and also not in this case here) since it wouldn't be possible to retrieve docstrings for the different signatures separately anymore.
Yes, probably it would be useful to add the signatures to the docstrings. We could use DocStringExtensions.jl to ensure that they are correct and simplify updates. Would you like to make a PR?
No, sorry. I do not feel competent enough, and I do not have time right now to figure out the best way of doing this.
You can obtain only the docstring of interest by searching more specifically for a certain function calls or signatures instead of the generic sample.
I did not know this! That does change things - having them split apart makes more sense in that case.