qi Use `compose-with-values` instead of `compose`

Summary of Changes

This approach was suggested by @samth @notjack and @benknoble on Discord as an alternative to using compose, and which may be more compatible with Typed Racket.

Below is a script for basic benchmarking written by Sam (and modified to include using compose-with-values). This clearly shows a greater than 2x improvement. But after integrating it into the Qi compiler (i.e., this PR), the existing composition benchmark seems to show a slight degradation in performance. It's hard to tell since the number is already so low, but increasing the number of runs by an order of magnitude shows comparable or slightly worse performance as a result of the change. This seems unexpected though, and I'm more inclined to trust the standalone result, but it would be great to at least understand the reason for the discrepancy between the results.

To run the Qi benchmark for "composition" (along with others -- make install-sdk first, though, if you haven't already):

$ make benchmark-nonlocal

The standalone benchmarking script:

#lang racket

(require syntax/parse/define)

(define (add1 x) (+ x 1))
(define (sub1 x) (- x 1))

(define (f1 x)
  (add1 (sub1 x)))

(define (f2 x)
  (call-with-values
   (lambda ()
     (call-with-values
      (lambda ()
        x)
      sub1))
   add1))

(define (f3 x)
  ((compose add1 sub1) x))

(define (f4 x)
  ((compose1 add1 sub1) x))

(define-syntax-parser compose-with-values
  [(_ args)
   #'(apply values args)]
  [(_ args f g ...)
   #'(call-with-values
      (λ ()
        (compose-with-values args g ...))
      f)])

(define-syntax-parser comp
  [(_ f ...) #'(λ args
                 (compose-with-values args
                                      f ...))])

(define (f5 x)
  ((comp add1 sub1) x))

(define N 100000000)

(collect-garbage)
'app
(time (for/fold ([k 0]) ([i (in-range N)])
    (f1 i)))
'call-with-values
(time (for/fold ([k 0]) ([i (in-range N)])
    (f2 i)))
'compose
(time (for/fold ([k 0]) ([i (in-range N)])
    (f3 i)))
'compose1
(time (for/fold ([k 0]) ([i (in-range N)])
        (f4 i)))
'compose-with-values
(time (for/fold ([k 0]) ([i (in-range N)])
        (f5 i)))

Public Domain Dedication

[x] In contributing, I relinquish any copyright claims on my contribution and freely release it into the public domain in the simple hope that it will provide value.

(Why: The freely released, copyright-free work in this repository represents an investment in a better way of doing things called attribution-based economics. Attribution-based economics is based on the simple idea that we gain more by giving more, not by holding on to things that, truly, we could only create because we, in our turn, received from others. As it turns out, an economic system based on attribution -- where those who give more are more empowered -- is significantly more efficient than capitalism while also being stable and fair (unlike capitalism, on both counts), giving it transformative power to elevate the human condition and address the problems that face us today along with a host of others that have been intractable since the beginning. You can help make this a reality by releasing your work in the same way -- freely into the public domain in the simple hope of providing value. Learn more about attribution-based economics at drym.org, tell your friends, do your part.)

Jan 16 '25 03:01 countvajhula

I have a feeling the outer (lambda args (and maybe deeply-nested (apply values args)) is what's hurting in our case.

Are there major cp0 output differences?

Jan 16 '25 03:01 benknoble

The outer args and deeply nested (apply values args) are present in the standalone benchmark too, though. And that shows the speedup that the Qi nonlocal benchmark doesn't show...

cp0, I created this module:

#lang racket

(require qi)

(define-flow composition
  (~> add1 sqr sub1))

(composition 5)

Then I ran:

$ PLT_LINKLET_SHOW_CP0=1 raco make scratch.rkt

That produced a ton of output with many sections claiming to be "cp0". But I found what appeared to be the right cp0 output.

When using the original compose:

;; cp0 ---------------------
(lambda (instance-variable-reference .get-syntax-literal!1
         .set-transformer!2 compose3 sqr4 print-values5 composition6
         lifted/1.17)
  (variable-set!/define lifted/1.17 (#2%void) 'consistent)
  (call-with-module-prompt
    (lambda ()
      ((if (#2%procedure? compose3)
           compose3
           ($app/no-inline
             (#3%$top-level-value 'slow-extract-procedure)
             compose3
             3))
        #2%sub1
        sqr4
        #2%add1))
    '(composition)
    '(constant)
    composition6)
  (let ([composition (#3%$object-ref
                       'scheme-object
                       composition6
                       9)])
    (call-with-module-prompt
      (lambda ()
        (#%call-with-values
          (lambda ()
            ((if (#2%procedure? composition)
                 composition
                 ($app/no-inline
                   (#3%$top-level-value 'slow-extract-procedure)
                   composition
                   1))
              5))
          print-values5)))
    (#2%void)))

When using compose-with-values:

;; cp0 ---------------------
(lambda (instance-variable-reference .get-syntax-literal!1
         .set-transformer!2 sqr3 print-values4 composition5
         lifted/1.16)
  (letrec ([composition (lambda args_2
                          (#%call-with-values
                            (lambda ()
                              (#%call-with-values
                                (lambda ()
                                  (#%call-with-values
                                    (lambda () (apply #2%values args_2))
                                    #2%add1))
                                sqr3))
                            #2%sub1))])
    (set-consistent-variables!/define
      (#2%vector lifted/1.16 composition5)
      (#2%vector (#2%void) composition))
    (call-with-module-prompt
      (lambda ()
        (#%call-with-values
          (lambda ()
            (let ([args_2 (#3%list 5)])
              (#%call-with-values
                (lambda ()
                  (#%call-with-values
                    (lambda ()
                      (#%call-with-values
                        (lambda () (apply #2%values args_2))
                        #2%add1))
                    sqr3))
                #2%sub1)))
          print-values4)))
    (#2%void)))

That suggest anything to you?

Jan 16 '25 06:01 countvajhula

does that suggest anything to you

No, but there are some apparent differences in the way the composition is handled (see the lets).

@samth might be about to land some improvements in the compiler though. https://github.com/racket/racket/pull/5182/

Jan 16 '25 17:01 benknoble

I verified that cp0 for Sam's original script for the two alternatives -- i.e., call-with-values manually and using compose-with-values (which expands to repeated call-with-values) -- are effectively the same (modulo the extra wrapping lambda and final apply -- but that's just due to how they are written in the source module):

compose-with-values:

	[f5 (lambda (x_133)
		  (let ([args_134 (#2%list x_133)])
			(#%call-with-values
			  (lambda ()
				(#%call-with-values
				  (lambda () (apply #2%values args_134))
				  1/sub1))
			  1/add1)))]

call-with-values:

	[f2 (lambda (x_130)
		  (#%call-with-values
			(lambda ()
			  (#%call-with-values (lambda () x_130) 1/sub1))
			1/add1))]

Jan 23 '25 00:01 countvajhula

Looking again at CP0 once integrated into the compiler, it looks like this bit that composes the functions is repeated twice:

(lambda args_2
  (#%call-with-values
    (lambda ()
      (#%call-with-values
        (lambda ()
          (#%call-with-values
            (lambda () (apply #2%values args_2))
            #2%add1))
        sqr3))
    #2%sub1))

... once in the letrec bindings clause, and then again in the body, where the binding, composition, is just not used! It seems to be doing double the work, which would seem to explain why the 2x improvement becomes nullified.

Looking at the changes in this PR, though, I don't immediately see why it should be duplicated that way. Any ideas?

Jan 23 '25 00:01 countvajhula

Looking again at CP0 once integrated into the compiler, it looks like this bit that composes the functions is repeated twice:
(lambda args_2
  (#%call-with-values
    (lambda ()
      (#%call-with-values
        (lambda ()
          (#%call-with-values
            (lambda () (apply #2%values args_2))
            #2%add1))
        sqr3))
    #2%sub1))
... once in the letrec bindings clause, and then again in the body, where the binding, composition, is just not used! It seems to be doing double the work, which would seem to explain why the 2x improvement becomes nullified.

Looking at the changes in this PR, though, I don't immediately see why it should be duplicated that way. Any ideas?

That stood out to me, too, and I wasn't clear at pointing it out. Not sure why it's happening, but a key factor is probably

    (set-consistent-variables!/define
      (#2%vector lifted/1.16 composition5)
      (#2%vector (#2%void) composition))

("Macro magic" ?)

Jan 23 '25 17:01 benknoble

I think the duplication is just inlining.
set-consistent-variables!/define is part of how linklets/modules are implemented by translation to Chez Scheme. There are two module-level variables there, and they get filled in with those two values

Jan 27 '25 18:01 samth

qi qi copied to clipboard

Use `compose-with-values` instead of `compose`

Summary of Changes

Public Domain Dedication

qi
qi copied to clipboard