DynamicPPL.jl icon indicating copy to clipboard operation
DynamicPPL.jl copied to clipboard

Improve docs, tutorials and README

Open ParadaCarleton opened this issue 3 years ago • 13 comments

ParadaCarleton avatar Feb 04 '22 15:02 ParadaCarleton

I haven't checked the tutorial (BTW there exist already some on the Turing webpage which could be updated and moved here) but I think it might be better to add it to the documentation instead of the README. Then it would even be possible to run code and ensure that it does not become outdated.

devmotion avatar Feb 04 '22 16:02 devmotion

@devmotion thanks, I encouraged Carlos to write this tutorial since he is learning how DynamicPPL works recently. The idea is to have a Turing-independent walk-through example of how DynamicPPL operates. It might be obvious for seasoned Turing developers but turns out tricky for newcomers. Keeping a simple tutorial inside DynamicPPL help to demystify stuff here, I think.

yebai avatar Feb 04 '22 18:02 yebai

I haven't checked the tutorial (BTW there exist already some on the Turing webpage which could be updated and moved here) but I think it might be better to add it to the documentation instead of the README. Then it would even be possible to run code and ensure that it does not become outdated.

I did suggest that to Hong, but it's up to him. We could also put it in both places, or alternatively just put a link to the docs in the README.

ParadaCarleton avatar Feb 04 '22 19:02 ParadaCarleton

Keeping a simple tutorial inside DynamicPPL help to demystify stuff here, I think.

I think it's great to improve documentation (we really have), and I als think it should be moved ro DynamicPPL. I just thought it might be better to add it to the documentation in the docs folder (in DynamicPPL) since it's a longer write-up and could be executed.

devmotion avatar Feb 04 '22 19:02 devmotion

I'm also not certain if writing a sampler is the best way to demonstrate how to work with DynamicPPL, in particular a simple one like MH which only requires logjoint and logjoint_unconstrained and shouldn't have to explicitly touch things like SimpleVarInfo.

The API used in this PR is a bit overkill for RWMH indeed, but that’s off the point. The goal is to explain these APIs. I suggest that we can add a new section that provides a simpler MH implementation based on logjoint APIs.

I'd say I agree with @devmotion regarding instead adding docs where this could potentially go, instead of putting this "somewhat" longer write-up in the README.

We have a plan to implement a doc feature where the README file is compiled into the default “home” page for documentation. So keeping the “get started” tutorial in README doesn’t matter that much. For example, both AdvancedHMC and Bijectors keep docs in the README file. These will go into respective docs when the suggested doc workflow is implemented. But also happy to move it to docs and add a link in README if there is a strong preference over that.

yebai avatar Feb 05 '22 19:02 yebai

The API used in this PR is a bit overkill for RWMH indeed, but that’s off the point. The goal is to explain these APIs.

It's exactly because of the goal I think a complicated implementation of RWMH is the wrong thing to do here :shrug:

An example implementation of something like sampler should do either of the following:

  1. Clarify the feature which we're trying to explain by demonstrating application.
  2. Bring together several features in an application to demonstrate why the collection of features are useful; "usefulness" I define here as a) making implementation of said application much simpler than the norm, or b) enabling implementation of application which wasn't (easily) implementable before.

IMHO a demo implementing RWMH doesn't really do either:

  1. evaluate!!, logjoint and getlogp are the only "features" (other how to define a Model) which are used, and neither of these are particularly difficult to understand.
  2. All we really need for RWMH is logjoint (and transformation of variables, but this isn't covered here anyways). This is not really something that is now suddenly enabled by DPPL. If all one wanted was a logjoint corresponding to a generative model, then you might as well implement that particular function, ignoring everything that DPPL has to offer.

I'm definitively not against including a tutorial! Nor am I against large parts of this PR, e.g. I very much appreciate trying to explain what's going on with evaluate!! etc. I just don't think the chosen example really motivates why one should care about DPPL, nor why it's particularly useful.

Maybe we should instead demonstrate an example of a sampler which actually makes use of the tilde-pipeline, transformed variables, etc.? I think demonstrating how to "hook into" the tilde-pipeline would be a nice addition.

torfjelde avatar Feb 06 '22 23:02 torfjelde

I agree with some parts of Tor's comments here -- notably, I think this fits better in a tutorial than on the README.

That being said, I'm concerned about throwing too much stuff into one tutorial and ending up with it being unreadable. A lot of the current tutorials are definitely like that. It's definitely much easier to make a tutorial too long than to make it too short, so we should err on the side of making them short. The costs are very asymmetric -- if you make the tutorial too long, people will refuse to read it, but if you make it too short, people just have to click through to the next page slightly more often.

Maybe I should note that this tutorial is already longer than around 60% of pages in the Stan reference manual. There's nothing wrong with a short tutorial -- some pages in the Stanual are shorter than this comment!

ParadaCarleton avatar Feb 08 '22 02:02 ParadaCarleton

So basically -- I'd be happy to write another tutorial on hooking into the tilde-pipeline, but I think it should be separate. This is meant to be a very brief+gentle introduction to DynamicPPL.

ParadaCarleton avatar Feb 08 '22 02:02 ParadaCarleton

To make my comments more concrete and also more constructive. This is a rough sketch of a simple MH sampler for DynamicPPL:

using DynamicPPL
using AbstractMCMC

struct MHSampler{P} <: AbstractMCMC.AbstractSampler
    proposal::P
end

function AbstractMCMC.step(
    rng::Random.AbstractRNG, model::DynamicPPL.Model, sampler::MHSampler;
    init_from::Union{Nothing,NamedTuple}=nothing,
    kwargs...
)
    if init_from === nothing
        x = SimpleVarInfo(
            model, SamplingContext(rng, SampleFromPrior(), DefaultContext())
        )
        samples = DynamicPPL.values_as(x, NamedTuple)
        logjoint_samples = getlogp(x)
    else
        samples = init_from
        logjoint_samples = logjoint(model, samples)
    end
    return samples, (samples, logjoint_samples)
end

function AbstractMCMC.step(
    rng::Random.AbstractRNG, model::DynamicPPL.Model, sampler::MHSampler, state;
    kwargs...
)
    samples, logjoint_samples = state
    proposed = sampler.proposal(rng, samples)
    logjoint_proposed = logjoint(model, proposed)
    if logjoint_samples < logjoint_proposed + randexp(rng)
        return proposed, (proposed, logjoint_proposed)
    else
        return samples, (samples, logjoint_samples)
    end
end

# Example
using Distributions

@model function demo(x)
    m ~ Normal(0, 1)
    x ~ Normal(m, 1)
end
model = demo([1.2, 3.5, 2.4])

sampler = MHSampler((rng, x) -> (m = randn(rng) + x.m,))

sample(model, sampler, 100)

sample(model, sampler, 100; init_from=(m=10.0,))

sample(model, sampler, 100; discard_initial=100, thinning=10)

sample(model, sampler, MCMCSerial(), 100, 4)
sample(model, sampler, MCMCThreads(), 100, 4)
sample(model, sampler, MCMCDistributed(), 100, 4)

In fact it would be even simpler if DynamicPPL would support initial parameters also for SimpleVarInfo, currently it always uses VarInfo. As you can see, one gets all kind of functionality for free by implementing the AbstractMCMC interface. It's designed exactly for this purpose, and hence we want users and downstream developers to implement it when implementing a sampler. The actual MH step is very simple and the idea is not to deal with SimpleVarInfo there at all but only with the NamedTuple and logjoint. Again it's designed for this purpose, so we should tell people about it :slightly_smiling_face: The only place where more internal/advanced stuff is used is when the initial sample is generated. Maybe we should implement something like

function rand(rng::AbstractRNG, ::Type{T}, model::Model) where {T}
    x = SimpleVarInfo(
        model, SamplingContext(rng, SampleFromPrior(), DefaultContext())
    )
    return DynamicPPL.values_as(x, T)
end

Then we could use just rand(rng, NamedTuple, model) and logjoint there as well and write:

function AbstractMCMC.step(
    rng::Random.AbstractRNG, model::DynamicPPL.Model, sampler::MHSampler;
    init_from::Union{Nothing,NamedTuple}=nothing,
    kwargs...
)
    samples = if init_from === nothing
        rand(rng, NamedTuple, model)
    else
        init_from
    end
    logjoint_samples = logjoint(model, samples)
    return samples, (samples, logjoint_samples)
end

devmotion avatar Feb 25 '22 03:02 devmotion

This is great, thanks @devmotion! I'll definitely use it.

Is this a fully-working sampler? I'm asking because it doesn't seem to implement some of the things in the AbstractMCMC tutorial -- namely it's missing an InferenceAlgorithm and a SamplerState type.

ParadaCarleton avatar Feb 26 '22 23:02 ParadaCarleton

It's fully working. The AbstractMCMC tutorial is focused on Turing specifics and largely outdated. The correct, (hopefully) complete and up-to-date docs are in the AbstractMCMC repo: https://turinglang.github.io/AbstractMCMC.jl/dev/

devmotion avatar Feb 26 '22 23:02 devmotion

With #381 being merged and released, the MH sampler can be simplified to:

using DynamicPPL
using AbstractMCMC

using Random

struct MHSampler{P} <: AbstractMCMC.AbstractSampler
    proposal::P
end

function AbstractMCMC.step(
    rng::Random.AbstractRNG, model::DynamicPPL.Model, sampler::MHSampler;
    init_from::Union{Nothing,NamedTuple}=nothing,
    kwargs...
)
    samples = init_from === nothing ? rand(rng, model) : init_from
    logjoint_samples = logjoint(model, samples)
    return samples, (samples, logjoint_samples)
end

function AbstractMCMC.step(
    rng::Random.AbstractRNG, model::DynamicPPL.Model, sampler::MHSampler, state;
    kwargs...
)
    samples, logjoint_samples = state
    proposed = sampler.proposal(rng, samples)
    logjoint_proposed = logjoint(model, proposed)
    if logjoint_samples < logjoint_proposed + randexp(rng)
        return proposed, (proposed, logjoint_proposed)
    else
        return samples, (samples, logjoint_samples)
    end
end

Examples:

using Distributions

@model function demo(x)
    m ~ Normal(0, 1)
    x ~ Normal(m, 1)
end
model = demo([1.2, 3.5, 2.4])

sampler = MHSampler((rng, x) -> (m = randn(rng) + x.m,))

sample(model, sampler, 100)

sample(model, sampler, 100; init_from=(m=10.0,))

sample(model, sampler, 100; discard_initial=100, thinning=10)

sample(model, sampler, MCMCSerial(), 100, 4)
sample(model, sampler, MCMCThreads(), 100, 4)
sample(model, sampler, MCMCDistributed(), 100, 4)

No SimpleVarInfo, VarInfo, evaluate!! etc. :slightly_smiling_face:

devmotion avatar Mar 02 '22 13:03 devmotion

@ParadaCarleton There are a lot of excellent improvements in this PR. However, it is a bit difficult for other team members to review and approve. I suggest that we break this PR into a few self-contained smaller PRs. We can quickly review and merge each individual PR.

yebai avatar Mar 30 '22 13:03 yebai

Closed in favor of making several smaller PRs.

ParadaCarleton avatar Dec 21 '22 02:12 ParadaCarleton