DynamicPPL.jl icon indicating copy to clipboard operation
DynamicPPL.jl copied to clipboard

Simplify the context mechanism in DynamicPPL

Open mhauru opened this issue 1 year ago • 17 comments

Current plan

[!NOTE]
Last updated: 5 August 2025

Here is a list of our current contexts and what's the current plan for them.

✅ = Already merged into main/breaking (or no plans to change)
♻️ = We know what to do but it's not done yet
❌ = Not sure what to do

Context in v0.36 in future Details
DefaultContext Leaf ✅ Leaf Maybe rename to AccumulationContext
DynamicPPL.DynamicTransformationContext Leaf ✅ Leaf Keep
LikelihoodContext Leaf ✅ Accumulator #885
PriorContext Leaf ✅ Accumulator #885
SamplingContext Parent ♻️ Leaf In DynamicPPL, replace with InitContext: #955, #967 . In Turing, replace with sampler-specific leaf contexts
ConditionContext Parent ♻️ Model field #1010 Store in model instead of context
PrefixContext Parent ♻️ Model field #1010 Store in model instead of context
DynamicPPL.FixedContext Parent ♻️ Model field #1010 Store in model instead of context
DynamicPPL.PointwiseLogdensityContext Parent ✅ Accumulator #885
DynamicPPL.PriorExtractorContext Parent ✅ Accumulator #907
DynamicPPL.DebugUtils.DebugContext Parent ✅ Accumulator #976
DynamicPPL.ValuesAsInModelContext Parent ✅ Accumulator #908
DynamicPPL.TestUtils.TestLogModifyingChildContext Parent ✅ Removed #885
MiniBatchContext Parent ✅ Removed Removed in #885. Could be replaced with getlogdensity (#925)
DynamicPPL.TestUtils.TestParentContext Parent ❗️Note that if there are no more parent contexts, there is no need for a test parent context, so this can be removed.
Turing.Inference.GibbsContext Parent(*) Maybe merge with ConditionContext? Otherwise probably keep.
Turing.Optimisation.OptimizationContext Parent ✅ Removed Replaced with accumulators https://github.com/TuringLang/Turing.jl/pull/2550 https://github.com/TuringLang/Turing.jl/pull/2630

(*) Can only wrap leaf contexts or PrefixContext

Previous discussion

(Writeup by @penelopeysm, but the ideas here were jointly discussed by @willtebbutt, @mhauru, @sunxd3 and I)

[!NOTE]
This is mostly obsolete and included mainly for historical interest. See this comment and #744 for a general overview of what's currently happening.

Background

Models on their own specify a set of variables and their relationship. However, there are many different ways a model can be evaluated, each with different outcomes. For example:

  • one might want to only accumulate logp of observations, thus giving the value of the (log) likelihood
  • one might want to record extra information about the variables seen in the model, e.g. to check whether a variable is specified twice
  • one might want to use existing values stored in a varinfo, or sample new values
  • ...

However, at the core, we still want to have a single model evaluation function; we don't want to write separate functions for each of these options.

The way this is accomplished is through the use of evaluation contexts. Specifically, DynamicPPL.evaluate!! takes an extra context argument which can contain any sort of information it wants, and is passed through the tilde pipeline all the way down to assume and observe. Thus, contexts may implement their own assume(..., ::MyContext) methods to inject custom behaviour that occurs when a model is evaluated.

Each context is its own type (which subtypes AbstractContext), and contexts are modelled as a linked lis. There are three 'leaf contexts', DefaultContext, LikelihoodContext, and PriorContext. 'Parent contexts' typically contain some kind of information, plus a child context.

diagram of linked list of contexts

There is, apparently, something very complicated about this, and we'd like to simplify it.

Possible directions

(1) Rework data structure

One option is to use a new data structure for a series of contexts, as shown below

  • (a) Extract the modifiers into a separate part Instead of having the linked list above, one option would be to have a base context plus an array of 'context modifiers': diagram of array of modifiers plus base context

  • (b) Flatten the modifiers into a single 'super-context' that can do everything diagram of flattened supercontext

In both cases, this would mean that instead of overloading assume and observe for each type of context, we would only have one method (or maybe N methods, where N is the number of leaf contexts). Inside that method though, we would have to either iterate over the list of modifiers, or dynamically perform some action depending on whether the modifier field in the super-context is nothing.

There is an additional danger in (b) as well, in that it does not preserve information about the order in which the modifications should be applied. That is to say, it assumes that modifiers are commutative. I don't know of a particular case where this isn't true, but I can imagine that it might happen with e.g. some combination of PrefixContext + ConditionContext.

In my (Penny's) opinion, doing this is pretty much just shuffling the complexity around and won't really give us substantial code simplification. It would also invalidate the type-based dispatch system that we now have, so I also worry about whether it would cause performance regressions.

(2) Place more type bounds on the contexts

Anohter option is to retain the linked list structure, but to place more constraints on what types of contexts can appear in which situation. For example, there are a number of contexts that are used solely in the implementation of a single function: see e.g. ValuesAsInModelContext and DynamicPPL.values_as_in_model. These contexts have no business being at anywhere but the head of the linked list (i.e. the top of the context stack), and we could enforce this by having something like this:

abstract type CanBeChildContext <: AbstractContext

struct DefaultContext <: CanBeChildContext end

# Note that we don't subtype CanBeChildContext
struct ValuesAsInModelContext <: AbstractContext
    values::T
    childcontext::CanBeChildContext
end

This makes it illegal to, for example, construct certain combinations such as a ValuesAsInModelContext{ValuesAsInModelContext{DefaultContext}}}.

(Markus had ideas about extending this trait-like system even more, so e.g. specific traits would allow contexts to perform specific actions such as modifying VarInfo. My memory of this is a little bit hazy, so I will ask him to write this)

My (Penny's) opinion is basically that I like the general idea, but I'm not yet convinced this will help us very much. I think it's really difficult to group contexts according to 'shared behaviour' / 'these contexts have equal levels of permissions'. And although the 'can be child context' one might seem like an obvious low-hanging fruit, I don't think it makes for very much improvement in the code because ValuesAsInModelContext is already contained to a single source file, and thus very unlikely to 'leak' out of it and cause weird contexts like SamplingContext{ValuesAsInModelContext}.

The other concern with this is that they're basically 'mixin' traits, which work nicely in something like Python because you can inherit from as many classes as you like, but in Julia the fact that each type must have only one supertype leads to difficulties when you want to inherit multiple types of behaviour. For example:

# Can do Foo
abstract type CanFooContext <: AbstractContext end

# Can do Bar
abstract type CanBarContext <: AbstractContext end

# Can do Foo and Bar
# You can plug this anywhere you need a CanFooContext, but not a CanBarContext?
abstract type CanFooBarContext <: CanFooContext end

What now?

The thing about the whole context thing is that I don't entirely know how it can be simplified without re-introducing the complexity in some other format. The purpose of having different contexts is because it allows us to abstract away the notion of 'model evaluation' into the tilde pipeline with assume / observe, but if we want to support different evaluation modes, then the extra information needed for this has to be supplied in some format.

Personally, I did find the context mechanism very confusing at the start, but after having worked with it a bit I think it is actually quite natural. (And as a Haskeller: it has a direct resemblance to monad transformer stacks for handling effects.) And my personal intuition would be that the best thing to do is to write more extensive documentation on this. We have a nice base of the minituring page to work with: https://turinglang.org/docs/developers/compiler/minituring-contexts/ which explains the 'need' for contexts very well, and I think it would be great to (a) expand on this need, by showing the model evaluation function, and (b) explain how it's used inside DynamicPPL itself, and (c) demonstrate how to extend this system with a custom context.

However, I'm conscious that maybe there are other perspectives that I'm not seeing, so if someone else has ideas on how contexts can be simplified, please do post below!

Related issues

  • https://github.com/TuringLang/DynamicPPL.jl/issues/274

mhauru avatar Dec 05 '24 16:12 mhauru

Note from chat:

  1. remove .~ before tackling this,
  2. ask @torfjelde if he has any good ideas for simplification.

@penelopeysm to type up different possibilities from today's discussion.

willtebbutt avatar Jan 31 '25 14:01 willtebbutt

will expand, but

  1. Flattening contexts into one big struct with optional fields
  2. Flattening contexts into an array of their effects
  3. Intermediate abstract context types with certain invariants

penelopeysm avatar Jan 31 '25 14:01 penelopeysm

I'll probably need a bit more context on the different possibliities 👀

torfjelde avatar Jan 31 '25 17:01 torfjelde

I've now (finally) written up the above. I'm clearing my assignment because my assessment of the situation is basically to do nothing to the code and write better docs, but if someone else would like to do some code changes, we can assign someone again.

penelopeysm avatar Feb 19 '25 19:02 penelopeysm

(Markus had ideas about extending this trait-like system even more, so e.g. specific traits would allow contexts to perform specific actions such as modifying VarInfo. My memory of this is a little bit hazy, so I will ask him to write this)

I think the main source of complexity with contexts is that given that they can change absolutely anything about the behaviour of tilde_assume and tilde_observe, it is very hard to reason about what happens when you nest them in particular ways. If I make a new context that does magic operation X on the VarInfo and then calls the same method for the child context, is my magic operation X going to mess with how the child context does its job? Might the child context mess with what I'm trying to achieve with X? Might having X and whatever the child context does happen in sequence cause an error, or incorrect results? I have no way of answering any of these questions, because the child context can do absolutely anything.

A case study: The ESS sampler changes the leaf context used to evaluate logp to always be LikelihoodContext, even when we are sampling from e.g. the posterior, because something about how ESS works. This caused a subtle bug with the new Gibbs sampler, when it interacted with how the GibbsContext short-circuits the evaluation of tilde_assume for variables that are not being sampled by the current component sampler, causing the wrong leaf context to be used in some cases. The way Tor and I solved this was to make GibbsContext always insert itself right above the leaf context. This feels like an ugly hack, and if anyone ever has to write another context that for some reason needs to be right above the leaf, the two will clash with unpredictable consequences.

The idea with the traits/abstract subtypes would be to have less contexts that use the full power of the context mechanism to override anything about the behaviour of the tilde pipeline. I think we have many contexts that actually do something quite simple, and having them explicitly marked as being of some lesser tier of context, where they are for instance only allowed to run a postprocessing or preprocessing function before calling tilde_obssume on the child, might make it easier to reason about their behaviour. I'm quite uncertain whether this would actually solve much, but that's the idea that Penny was referring to above.

The thing about the whole context thing is that I don't entirely know how it can be simplified without re-introducing the complexity in some other format.

I think I agree with Penny here, that none of the proposals so far are much better than what we have. I would still like to keep trying to come up with better ideas though, because contexts are one of the most confusing aspects of how Turing works.

One way forward could be to first implement https://github.com/TuringLang/DynamicPPL.jl/issues/744, which should allow us to get rid of the leaf contexts and maybe a few other contexts. Maybe that would simplify solving this one.

mhauru avatar Feb 20 '25 09:02 mhauru

Maybe all I'm saying above is that we could try to use lighter structures and mechanisms, existing and new, to do much of the work that is currently done by contexts, and maybe after we've done that we would be left with only a small handful of contexts. It could then be easier to either redesign the context mechanism to meet the needs of that small handful, or to accept leaving that small handful of contexts as they are and just document them well.

mhauru avatar Feb 20 '25 10:02 mhauru

@mhauru Assigning you now because I think https://github.com/TuringLang/DynamicPPL.jl/issues/744 is (by far!) our best shot.

penelopeysm avatar Mar 05 '25 00:03 penelopeysm

We can also consider using special value types in VarInfo to implement condition/fix (for example, missing represents a missing value; we could introduce ConditionedVariable, FixedVariable, etc.).

This has the advantage of being conceptually more intuitive. It also offers better extensibility: we only need to implement a new data type with respective overloads of tilde_obbsume for new model operations.

yebai avatar May 02 '25 09:05 yebai

Also related:

  • https://github.com/TuringLang/Turing.jl/issues/2537

yebai avatar May 02 '25 09:05 yebai

@penelopeysm, @mhauru, let's keep the table in https://github.com/TuringLang/DynamicPPL.jl/issues/895#issuecomment-2845073783 in sync with recent new plans and PRs for DynamicPPL contexts.

I suggest that @penelopeysm focus on this issue over the next two to four weeks as an attempt to resolve it.

Might require https://github.com/TuringLang/DynamicPPL.jl/pull/925

Note: we might no longer need PrefixContext after https://github.com/TuringLang/DynamicPPL.jl/issues/965, since all variables will have unique global names, including those in submodels.

yebai avatar Jul 07 '25 15:07 yebai

Updated the table. Sure, I'm about halfway through SamplingContext/InitContext (although trying to split up the PRs into manageable bits for Markus's sake 🙃)

penelopeysm avatar Jul 07 '25 16:07 penelopeysm

@mhauru, I think MiniBatchContext can be replaced with a customised log-density function as per #925?

penelopeysm avatar Jul 07 '25 16:07 penelopeysm

Quick question for consideration: Do we still need the parent/leaf context design, or can we opt for a simple list of contexts, similar to Accumulators, now?

yebai avatar Jul 07 '25 17:07 yebai

Unfortunately, there are interactions between ConditionContext and PrefixContext that prevent us from collapsing it into an unordered set. Specifically,

ConditionContext((; x.a = 1), PrefixContext(@varname(x), DefaultContext())
PrefixContext(@varname(x), ConditionContext((; x.a = 1), DefaultContext())

have different behaviour in that if a model with a is evaluated with the first context, it will be observed, whereas it will be assumed with the second one. (I wrote more about this https://turinglang.org/docs/developers/contexts/submodel-condition/)

As far as I know, that this is the only instance where (1) the ordering of context stack matters; and (2) I don't know how to resolve it. Right now, the order of GibbsContext also matters, but I'm also about 90% sure that GibbsContext could be a leaf context.

It is a bit of a shame because I feel like we were so close, but just hampered by this one interaction 🥲

penelopeysm avatar Jul 07 '25 18:07 penelopeysm

I suppose technically maybe it's also a bit weird if there are nested ConditionContexts with the same variable conditioned twice, it's not immediately obvious which value is taken (right now it takes the outermost, which is more intuitive for a linked list of contexts, but it would be ill-defined if we had an unordered set of contexts). Maybe that should be an error, or maybe we need to do something to collapse multiple adjacent ConditionContexts into one with a warning.

julia> @model f() = begin x ~ Normal(); return x; end
f (generic function with 2 methods)

julia> (f() | (; x = 1))()
1

julia> ((f() | (; x = 1)) | (; x = 2))()
2

penelopeysm avatar Jul 07 '25 18:07 penelopeysm

@mhauru, I think MiniBatchContext can be replaced with a customised log-density function as per https://github.com/TuringLang/DynamicPPL.jl/pull/925?

Yes. I think the Turing.jl integration PR also already has the new implementation (I don't think it needs to be in DPPL).

mhauru avatar Jul 08 '25 07:07 mhauru

I moved the table to the top comment.

penelopeysm avatar Jul 09 '25 13:07 penelopeysm