ChezScheme continuation attachments

This commit adds primitive support "continuation attachments", which are a simplified form of Racket's "continuation marks". It's joint work with Kent, but all bugs and bad ideas are mine.

While it's possible to implement continuation marks in Racket-on-Chez's delimited-control layer using just call/cc, the implementation has a two significant drawbacks:

The body of with-continuation-marks gets wrapped in a closure to be passed to call/cc. For example,

      (lambda (x)
        (+ 1 (with-continuation-mark
              'key 'val
              (get-integer-like x))))

becomes, at best, something like

      (lambda (x)
        (+ 1 (call/cc
              (lambda (k)
                (push-marks-for-continuation k 'key 'val)
                (begin0
                  (get-integer-like x)
                  (pop-marks-for-continuation))))))

When with-continuation-mark is in tail position, it's more complicated, because the body of with-continuation-mark is supposed to be in tail position relative to the with-continuation-mark form. So,

      (lambda (x)
        (with-continuation-mark
          'key 'val
          (get-integer-like x)))

has to be something like

      (lambda (x)
        (call/cc
          (lambda (k)
            (let ([finish (lambda () (get-integer-like x))])
              (if (continuation-has-marks? k)
                  (begin
                    (update-mark-for-continuation k 'key 'val)
                    (finish))
                  (begin0
                   (call/cc
                    (lambda (nested-k)
                      (push-marks-for-continuation nested-k 'key 'val)
                      (finish)))
                   (pop-marks-for-continuation)))))))

which doesn't always actually put the body in tail position, but at most doubles the length of the continuation (and other control functions have to hide the extra continuation frame that is created here to pop marks). Meanwhile, if the body in finish isn't small enough to inline, a second closure and indirect call is created in this transformation.

The implementation using call/cc sometimes forces extra call-stack frames to be created. For example, in

      (lambda (x)
        (+ 1 (with-continuation-mark
              'key 'val
              (if (integer? x)
	          x
                  (slower-get-integer)))))

a continuation frame is always created for the if form. Without the with-continuation-mark wrapper, a frame would be created only in the case that slower-get-integer is called.

A front-end optimizer could reduce this kind of cost --- in this case pushing the with-continuation-mark into the "else" branch (since it's not observable in the "then" branch), and in other cases by generating a push--computation--pop sequence when the computation is not in tail position. But that starts to get into the job that normally belongs to the compiler.

Direct Support: Costs

Direct support for attachments in the compiler and runtime system imposes a some costs on all programs:

There's an extra compiler pass (but it should be fast).
Reifying a continuation captures an attachment stack in much the same way that it captures the winder stack. Adding an extra field to a continuation turns out not to use more memory, due to alignment constraints, but it does add a field to initialize and traverse in the GC.
The dounderflow return for a reified continuation has an extra test and assignment to potentially restore an old attachment stack.
Winders record the attachment stack and restore before calling a pre/post thunk.

See the implementation notes below for more information.

The cost on continuation operations is modest. On a microbenchmark that repeatedly reifies a continuation,

  (let loop ([i 10000000])
    (unless (zero? i)
      (loop
       (call/cc
        (lambda (k)
          (sub1 i))))))

about 2-3% more instructions are executed to capture the attachment stack and "restore" it. For the tail variant

  (let loop ([i 10000000])
    (unless (zero? i)
      (call/cc
       (lambda (k)
         (loop (sub1 i))))))

there's no new cost, except on the first iteration.

The cost for dynamic-wind is larger. For a microbenchmark like

 (let loop ([i 1000])
   (unless (zero? i)
     (call/cc
      (lambda (k)
        (let loop ([j 100000])
          (if (zero? j)
              (k 'done)
              (dynamic-wind
                  void
                  (lambda () (loop (sub1 j)))
                  void)))))
     (loop (sub1 i))))

I'm seeing about 10% more instructions executed, whether or not k is wrapped around 'done. It's possible that the shift in representation from vectors to structures accounts for part of the change, but using a larger vector was even worse.

Direct Support: Benefits

The loops in https://gist.github.com/mflatt/0f889c7a1fc2547001d9920f5514fbe8 compare the new, native implementation of attachments with a simulation that uses `call/cc. Some rough numbers from my laptop:

   baseline iteration:                    210ms

   (tail) loop using simulation:         1750ms
   (non-tail) recur using simulation:    5300ms

   (tail) loop using attachments:         500ms
   (non-tail) recur using attachments:   1600ms

So, native attachment are 3x-4x as fast as the implementation using call/cc.

This improvement helps get Racket-on-Chez much closer to Racket's performance for continuation marks on https://gist.github.com/mflatt/3b760d019d5e8e74125bf13e94e5945f:

                                          Loop               Recur
                                  baseline    wcm     baseline    wcm
  Racket                            235       570       420       950
  Racket-on-Chez, attachments       205      1138       270      2000
  Racket-on-Chez, call/cc            "       3321        "       7400

These improvements translate to better performance on a contracts benchmark https://gist.github.com/mflatt/e850da15a25d7a07dcf0fa1544f94375, where continuation marks turn out to be the dominating factor (where the "simulated" version is apparently too pessimistic compared to the actual implementation of contracts, but the simulation was useful for focusing attention on continuation marks):

                                 contracts   simulated 
  Racket                            460         780
  Racket-on-Chez, attachments       840        1132
  Racket-on-Chez, call/cc          1920        2270

Implementation Notes

The simplification from "continuation marks" to "continuation attachments" means that the compiler only has to deal with getting and setting a single attachment, and it doesn't have to deal with the dictionary aspect of marks.

To minimize interactions with other parts of the compiler and runtime system, attachments are kept on a stack. This stack can grow faster than the call stack if attachments are added to conceptual continuations within a single call-stack frame. The stack can grow slower than the call stack (and that's the common case) if no attachments are added to continuations.

Similar to call-with-values, the public interface for attachments is expressed as functions, but a new compiler pass recognizes uses of the functions on immediate procedure and generates a primitive internal form that doesn't involve a closure. Specifically,

  (call-setting-continuation-attachment
   <new-value>
   (lambda ()
     <body>))

turns into something like

  (begin
    (push/update-attachment <new-value>)
    <body>)

and

  (call-with-current-continuation-attachment
   <default-value>
   (lambda (a)
     <body>))

turns into something like

 (let ([d <default-value>])
   (let ([a (if ($continuation-has-attachment?)
                (car ($current-attachments))
                d)])
     <body>))

Pushing onto the attachment stack is easy. The difficult parts are (1) knowing whether the current continuation already has an attachment, in which case a new attachment should replace instead of push, and (2) popping the attachment stack when the continuation shrinks, but only if the continuation has an attachment, and without turning a tail call into a non-tail call.

Within a function, whether to push or replace can sometimes be determined statically. For example, in

   (+ (call-with-continuation-attachment
       'x
       (lambda ()
         (do-some-work)
         result-var)
      10)

then the continuation of the first argument to + clearly has no attachment, so 'x can be simply pushed. Along the same lines, accessing result-var cannot inspect the attachments, so the attachment can be popped before returning result-var. The new compiler pass takes care of those easy cases.

If (do-some-work) above was in tail position, then there's no statically apparent place to pop the 'x attachment. Similarly, if the body of do-some-work has a call-with-continuation-attachment call in tail position, then it will not be statically apparent that the new attachment should replace 'x, as opposed to being pushed fresh.

To deal with these dynamic cases, every function return whose continuation potentially has an attachment must check for an attachment and pop it if present. Instead of imposing this cost on every return, we constrain attachments on call-frame boundaries to continuations that are reified in the sense of call/cc. Then, the check and potential popping of an attachment can be built into the dounderflow code that is substituted for the original return address when reifying a continuation.

When a call-setting-continuation-attachment call appears in tail position within a function, it forces the current continuation to be reified; that's done through a new intrinsic that is mostly a copy of callcc. If the reified continuation records an attachment stack that is not the same as the current attachment stack, then the continuation already has an attachment. (Note the off-by-one: a continuation stores the attachments for the rest of the continuation, and not its own attachment, because the rest of the continuation's attachments are the ones to restore when continuing).

One final case: When an attachment is added by call-setting-continuation-attachment in a non-tail position, then the corresponding attachment needs to be popped by any return from a function called in tail position relative to the call-setting-continuation-attachment. In other words, the continuation of the call to the function needs to be reified. That's implemented by rewriting the function call to one that goes through a wrapper that reifies the continuation and then tail-calls the original function. To avoid interfering with primitive inlining, which happen in a later compiler pass, the inlining pass recognizes the rewritten form and still inlines, adding an attachment pop just after the inlined code.

Since the attachment stack behaves somewhat like the winder stack, it may be tempting to try to merge them. Separating the attachment stack has the advantage that jumping from one continuation with N attachments to another with M attachments doesn't take O(N+M) time, which is an important property for Racket-on-Chez.

Jul 25 '18 18:07 mflatt

I've updated the PR to change the way relative-tail calls are handled under call-setting-continuation-attachment in a non-tail position. Instead of allocating an attachment-moving wrapper as a closure, the relative-tail call itself is adjusted to perform the move.

The change saves about 10-20% in the simulated contract example or in this variant of "(non-tail) recur using attachments":

   (let loop ([j 0])
     (cond
      [(= j N) 'done]
      [else
       (let loop ([i 0])
         (cond
          [(= i M) 0]
          [else (call-setting-continuation-attachment
                 i
                 (lambda ()
                   (sub1 (loop (add1 i)))))]))
       (loop (add1 j))]))

It also seems like a generally better approach, although it requires some care to ensure that the new info flag for a call or mvcall form isn't lost in various compiler passes.

Jul 26 '18 23:07 mflatt

Rebased and folded in #377 plus subsequent improvements.

May 20 '20 17:05 mflatt

@mflatt what's the progress of this PR?

Oct 01 '21 03:10 tisonkun

The PR remains as ready as ever, and the conflicts look easily resolvable. Whether there's any demand for the feature here is another question, though.

If there's interest in refining this PR, maybe a continuation-marks layer should be the external interface. Adding the key–value layer of "marks" allows multiple libraries to cooperate, unlike a single "attachment". Then, continuation attachments would be just an internal implementation detail.

I've on occasion considered adding continuation marks here, but then I conclude that there's probably not enough interest for that to be worthwhile. (Maybe it goes without saying, but this code lives on a part of the Racket branch of Chez Scheme, where continuation attachments are used extensively via the Racket layer's continuation marks.)

Oct 01 '21 13:10 mflatt

For what it is worth, Kernel language has continuations marks

Oct 06 '21 15:10 amirouche

Included in v9.9.9 merge (as a continuation-marks API) in 47daa9b34016de84fd111801d9d589d15a523abe.

Oct 17 '23 17:10 mflatt