inlined-generic-function icon indicating copy to clipboard operation
inlined-generic-function copied to clipboard

Bringing the speed of Static Dispatch to CLOS. Succeeded by https://github.com/marcoheisig/fast-generic-functions

  • Bringing the speed of Static Dispatch to CLOS --- Inlined-Generic-Function [[https://circleci.com/gh/guicho271828/inlined-generic-function][https://circleci.com/gh/guicho271828/inlined-generic-function.svg?style=svg]] [[https://travis-ci.org/guicho271828/inlined-generic-function][https://travis-ci.org/guicho271828/inlined-generic-function.svg?branch=master]] [[http://quickdocs.org/inlined-generic-function/][http://quickdocs.org/badge/inlined-generic-function.svg]]

Generic functions are convenient but slow. During the development we usually want the full dynamic feature of CLOS. However, when we really need a fast binary and do not need the dynamism, the dynamic dispatch in the generic functions should be statically compiled away.

We propose a MOP-based implementation of fast inlined generic functions dispatched in compile-time. The amount of work required to inline your generic function is minimal.

Empirical analysis showed that the resulting code is up to 10 times faster than the standard generic functions.

Tested on SBCL and CCL.

  • News

Thanks to the suggestion from @phmarek, I decided to add a feature called =invalid-branch= which uses impl-specific feature to enable a static type checking. [[./invalid-branch.org][Read the doc!]]

Added an additional usage note.

  • Usage

The example code here is in =t/playground.lisp=.

First, declare the generic function with =inlined-generic-function= metaclass. This metaclass is a subclass of =standard-generic-function=. Therefore, unless you use its special feature, it acts exactly the same as the normal generic functions.

#+BEGIN_SRC lisp (defgeneric plus (a b) (:generic-function-class inlined-generic-function)) #+END_SRC

Define the methods as usual.

#+BEGIN_SRC lisp (defmethod plus :around ((a number) (b number)) (+ a b) ;; not a meaningful operation... (call-next-method))

(defmethod plus ((a fixnum) (b fixnum)) (+ a b)) (defmethod plus ((a float) (b float)) (+ a b)) #+END_SRC

Define a function which uses it.

#+BEGIN_SRC lisp (defun func-using-plus (a b) (plus a b)) #+END_SRC

At this point the gf is not inlined.

#+BEGIN_SRC lisp ; disassembly for FUNC-USING-PLUS ; Size: 24 bytes. Origin: #x100A75A165 ; 65: 488BD1 MOV RDX, RCX ; no-arg-parsing entry point ; 68: 488BFB MOV RDI, RBX ; 6B: 488B059EFFFFFF MOV RAX, [RIP-98] ; #<FDEFINITION for PLUS> ; 72: B904000000 MOV ECX, 4 ; 77: FF7508 PUSH QWORD PTR [RBP+8] ; 7A: FF6009 JMP QWORD PTR [RAX+9] #+END_SRC

Now its time to inline the gf. There's nothing different from inlining a normal function. In order to inline the generic function, just declare it =inline= when you use it.

#+BEGIN_SRC lisp (defun func-using-plus (a b) (declare (inline plus)) (plus a b)) ; disassembly for FUNC-USING-INLINED-PLUS ; Size: 323 bytes. Origin: #x1002C3BD45 ; D45: 8D41F1 LEA EAX, [RCX-15] ; no-arg-parsing entry point ; D48: A80F TEST AL, 15 ; D4A: 755F JNE L2 ; ..... #+END_SRC

To see the actual compiler-macro expansion, use a function =inline-generic-function=.

#+BEGIN_SRC lisp (let ((features (cons :inline-generic-function features))) (print (inline-generic-function '(plus a b))))

;; Inlining a generic function PLUS

(LET ((#:A1734 (1+ A)) (#:B1735 (1- B))) (EMATCH* (#:A1734 #:B1735) (((TYPE FLOAT) (TYPE FLOAT)) (LET ((A #:A1734) (B #:B1735)) (DECLARE (TYPE FLOAT A)) (DECLARE (TYPE FLOAT B)) (+ A B) (LET ((A #:A1734) (B #:B1735)) (DECLARE (TYPE FLOAT A)) (DECLARE (TYPE FLOAT B)) (+ A B)))) (((TYPE FIXNUM) (TYPE FIXNUM)) (LET ((A #:A1734) (B #:B1735)) (DECLARE (TYPE FIXNUM A)) (DECLARE (TYPE FIXNUM B)) (+ A B) (LET ((A #:A1734) (B #:B1735)) (DECLARE (TYPE FIXNUM A)) (DECLARE (TYPE FIXNUM B)) (+ A B)))))) #+END_SRC

Since =ematch= from Trivia pattern matcher expands into thoroughly typed dispatching code, a sufficiently smart compiler would compile =+= into machine assembly, which is the case at least in SBCL.

** Automatic compile-time dispatching

If the code is inlined in a typed environment, smart compilers like sbcl can detect certain branches are not reachable, thus removing the checks and reducing the code size. This is equivalent to compile-time dispatch. In the example below, the code for dispatching FLOAT is removed.

#+BEGIN_SRC lisp (defun func-using-inlined-plus-and-type-added (a b) " ; disassembly for FUNC-USING-INLINED-PLUS-AND-TYPE-ADDED ; Size: 29 bytes. Origin: #x10031E7788 ; 88: 4801F9 ADD RCX, RDI ; no-arg-parsing entry point ; 8B: 488BD1 MOV RDX, RCX ; 8E: 48D1E2 SHL RDX, 1 ; 91: 710C JNO L0 ; 93: 488BD1 MOV RDX, RCX ; 96: 41BB70060020 MOV R11D, 536872560 ; ALLOC-SIGNED-BIGNUM-IN-RDX ; 9C: 41FFD3 CALL R11 ; 9F: L0: 488BE5 MOV RSP, RBP ; A2: F8 CLC ; A3: 5D POP RBP ; A4: C3 RET " (declare (inline plus)) (declare (optimize (speed 3) (safety 0))) (declare (type fixnum a b)) (plus a b)) #+END_SRC

If the types does not match, errors are signalled by =EMATCH=, which is consistent with the behavior of standard generic functions.

** Enabling Inlining Globally

Inlining is not globally enabled by default. This is because the inlined code becomes obsoleted when the generic function definition changes, and therefore you generally do not want to make them inlined during the development.

It can be enabled globally by adding =:inline-generic-function= flag in =features=, which is useful when you build a standalone binary. When this feature is present, all inlinable generic functions are inlined unless it is declared =notinline=.

#+BEGIN_SRC lisp (push :inline-generic-function features) #+END_SRC

** Benchmark Setting

We tested two generic functions, one of which is a standard-generic-function, and another is an inlined-generic-function.

Both generic functions follow the definition below:

#+BEGIN_SRC lisp (defgeneric plus (a b) [(:generic-function-class inlined-generic-function)]) (defmethod plus :around ((a number) (b number)) (+ a b) (call-next-method)) (defmethod plus ((a fixnum) (b fixnum)) (+ a b)) (defmethod plus ((a double-float) (b double-float)) (+ a b)) #+END_SRC

We tested them with and without =inline= declaration, i.e.,

#+BEGIN_SRC lisp (defun func-using-plus (a b) (declare (optimize (speed 3) (safety 0))) (plus a b))

(defun func-using-inlined-plus (a b) (declare (inline plus)) (declare (optimize (speed 3) (safety 0))) (plus a b)) #+END_SRC

Thus, we have 4 configurations in total. The experiment is run under AMD Phenom II X6 processor 2.8GHz with SBCL 1.3.1 (launched by Roswell). The benchmark function is shown below:

#+BEGIN_SRC lisp (defvar input (iter (repeat 1000) (collect (cons (random 100.0d0) (random 100.0d0))) (collect (cons (+ 20 (random 100)) (+ 20 (random 100)))))) (defun benchmark () (time (iter (for (a . b) in input) (func-using-normal-plus a b))) (time (iter (for (a . b) in input) (func-using-normal-inlined-plus a b))) (time (iter (for (a . b) in input) (func-using-plus a b))) (time (iter (for (a . b) in input) (func-using-inlined-plus a b)))) #+END_SRC

We first run the benchmark function 1000 times in order to calibrate the CPU cache. We then run the gc and invoke the benchmark function once more. We use the result of this final run in order to make sure the machine state is stabilized.

** Result

Since the difference in the runtime is relatively small due to the small amount of computation, we consider the processor cycles only. We found that the cost of generic function invocation is considerably low when an =inlined-generic-function= is invoked with =inline= declaration.

| metaclass and inline declaration | processor cycles | consing | |----------------------------------------+------------------+---------| | standard-generic-function, not inlined | 742,285 | 0 | | standard-generic-function, inlined | 726,023 | 0 | | inlined-generic-function, not inlined | 7,865,080 | 523,760 | | inlined-generic-function, inlined | 74,120 | 0 |

Note that the third case, where the =inlined-generic-function= is not inlined, is slower than the normal generic function. This would be because we use the non-standard metaclass for representing the generic function and the normal optimization provided by the implementation is not performed. However, this is not a problem because we consider the third case only takes place during the development.

** Conclusion

We showed that ... well, anyway, this is not a paper. Enjoy!

** Dependencies

This library is at least tested on implementation listed below:

  • SBCL 1.3.1 on X86-64 Linux 3.19.0-39-generic (author's environment)

Also, it depends on the following libraries:

  • trivia by Masataro Asai :: NON-optimized pattern matcher compatible with OPTIMA, with extensible optimizer interface and clean codebase

  • closer-mop by Pascal Costanza :: Closer to MOP is a compatibility layer that rectifies many of the absent or incorrect CLOS MOP features across a broad range of Common Lisp implementations.

  • alexandria by :: Alexandria is a collection of portable public domain utilities.

  • iterate by :: Jonathan Amsterdam's iterator/gatherer/accumulator facility

  • Usage Note

When you use this library as part of your system, make sure that the method definitions are re-evaluated in load-time. This is necessary because the inlining information for the method could be lost after the compilation, i.e., the FASL file does not keep the defmethod form that should be inlined later.

The only thing you need is:

#+BEGIN_SRC lisp (eval-when (:compile-toplevel :load-toplevel :execute) (defmethod ...) ...) #+END_SRC

  • Installation

Quicklisp available.

** Author

  • Copyright

Copyright (c) 2015 Masataro Asai ([email protected])

  • License

Licensed under the LLGPL License.