guppylang icon indicating copy to clipboard operation
guppylang copied to clipboard

Replace Order edges with linear "effect tokens"

Open acl-cqc opened this issue 4 months ago • 4 comments

We may need to establish a better list of effect tokens, but a couple are:

  • allocation state (qalloc/qfree)
  • results (tket2.result.*)
  • general termination (e.g. function call)

Possibly more support may be required from Hugr, but an application/use case will help to guide this!

acl-cqc avatar Sep 17 '25 11:09 acl-cqc

So the win here for guppy is that Hugr now checks we have wired up all the effects we should, which is significant. (The cost is that to pass an effect down into one Case of a Conditional, we'll need to pass it through all the other Cases too....) The bigger win is inside Hugr, where we'll eventually be able to simplify lots of optimization code not to have to deal with Order edges, and (again) we'll get more checking.

Worth thinking about what this will look like, though, compared to the current track_hugr_side_effects.

Needing to specify all an ops' inputs at once (i.e. including effects), probably means we'll have to move the code from a global monkey-patch to being spread around the various builder classes. However, we could do this at the moment (with Order edges) - replacing the prev_node_with_side_effect dict with a single Node | None field in each dataflow graph builder - but we don't, because (I think) it'll be messier; grotesque as the monkey-patch may be, it does at least keep all the code together.

Then, to handle effects, we'll have to change that single Node to one Node per effect type.

We'll still need much the same hierarchy-traversal algorithm as now - perhaps when each DSG finishes, its parent works out whether to wire it in. But (finally) it'll get more complicated due to Cases etc.

So from a code health perspective....I think the main (only?) win here may be removing the single may_have_side_effects function and EXTENSION_OPS_WITH_SIDE_EFFECTS dicts and instead looking at the op signatures. Otherwise I think things are likely to get messier rather than tidier...unless I'm missing something?

acl-cqc avatar Oct 22 '25 17:10 acl-cqc

Otherwise I think things are likely to get messier rather than tidier...unless I'm missing something?

Agreed, implementing linear effect tokens will definitely be more difficult and most likely more messy. The main benefit is that the resulting Hugr will be nicer.

replacing the prev_node_with_side_effect dict with a single Node | None field in each dataflow graph builder

I'm not a fan of using the hugr builder for this. We could probably get away with storing the effect wires in the DFContainer object that gets passed around when compiling to Hugr?

Also note that we currently assume that every call has side-effects. But it would be a lot nicer if we only pass effect tokens into functions that are actually needed. Inferring those would probably require a fixed-point iteration over the call graph?

mark-koch avatar Oct 23 '25 10:10 mark-koch

Thanks @mark-koch! Yeah, there are a few issues.

  1. Do we continue with monkeypatch "modify the current/in-progress builder and all ancestors" or move to something more principled (but probably more widespread)? Thinking about this, the best way is probably to combine with your:

Also note that we currently assume that every call has side-effects. But it would be a lot nicer if we only pass effect tokens into functions that are actually needed. Inferring those would probably require a fixed-point iteration over the call graph?

On the latter, yes I think that is what we'd need; and also over other structures within functions (e.g. (that will become) CFGs, TailLoops, DFGs, etc.). This probably constitutes a stage in the compiler inbetween building the CFG and beginning construction to a Hugr (as the latter uses the results, to know what tokens to add to every TailLoop/CFG/etc.).

In terms of how to do that analysis....there is now hugr_core::module_graph::ModuleGraph in Rust, but we have yet to expose that in python. That might be one way (but proceeding carefully). We'd solve calls first - a function uses a token type if any node inside it does, or a call - and then compute CFG/etc. by containment. However, that would mean building a Hugr first in order to use it, so we might be better off just doing something bespoke (perhaps doing other containers in one step).

Also note we'll need a token for function/loop termination, which you can set as not-needed for (a) any leaf (no calls), and (b) anything all of whose call-targets don't need it. (Recursive functions satisfy neither criterion.)

The advantage is that when we build Hugr, we know which token types we need at every step; we don't need to change types of any "in progress" builder. (The hierarchical traversal is done by the analysis.)

  1. Hugr containers - do we want support for e.g. methods for adding token types to (existing) containers, and their Input / Output? In Rust that could be in trait HugrMut or the builder, and could be useful if we were using rust, but given we are using hugr-py I'm less clear - we could move some of the monkeypatch hierarchy-traversal into hugr-py, but item (1.) suggests to take a different route.

  2. Hugr ops - do we want some mechanism for adding ports to other, non-container, OpTypes? There are a few ways that Hugr could do this; from the guppy end, this probably means adding token types to guppy functions decorated with @hugr_op / using guppylang_internals.definition.custom.OpCompiler. There are two routes at the guppy end: we could exposing a function with extra (token-type) parameters that wraps/calls the @hugr_op function (which does not take those parameters) - this would translate easily to the metadata approach by adding the same tag metadata to the function, and ensuring function inlining preserves that metadata onto the DFG. (We need to make sure, that when the outer function gets inlined, the tokens that were just passed from Input through to Output, don't lose all connection with the op.) Or we could extend @hugr_op to allow specifying such tokens as a decorator parameter; this mays more easily to any of the different Hugr implementations (including metadata!).

I think the question really is how often this will be required: I think the majority of ops needing tokens are (a) custom ops that require a certain token-type by their own definition; and (b) container types, as point 1. So where are these @hugr_op functions - are they all in guppylang_internals and why do the ops not need those token types by definition? So this might be useful as a migration measure but we might also avoid a bunch of work by leaving it and hoping we don't need it.

  1. What about guppy support for defining your own token types in your own extension, and ops using them? We may want features here to make this easier, but the only "required" thing really is to make the guppy compiler aware that your new type is a token so it can use the analysis and automatically add ports. (Without that you would have to do the wiring "yourself", which is not ideal but still a lot easier in guppy source than Hugr!)

Did we have a code health issue for de-monkeypatch-ing? And/or for adding Drop ops "properly" rather than by inspecting the Hugr - this might possibly tie up with that too.

acl-cqc avatar Nov 24 '25 12:11 acl-cqc

In terms of requirement for migration, we could+should review every use of @hugr_op in the guppy codebase and see whether we'll be able to modify the OpType in question to take the tokens, or need a wrapper. (Of course we can speculate about uses of guppylang_internals outside of guppylang repo, and whether we mind breaking them, too....)

We also need to resolve guppy syntax etc. for declaring token types / registering them with the compiler (i.e. declaring types to make "silent at the point of use").

acl-cqc avatar Nov 24 '25 17:11 acl-cqc