inlined-generic-function Benchmark behaviour on defined class is slow.

The benchmarks provided are for methods on the built-in lisp types number, fixnum and double-float. To test the behaviour on defined classes we added a simple boxing class and found that peformance degraded when using inlined-generic-functions, inlined. We found the following numbers of processor cycles for the four methods in playground.lisp, respectively:

Experiment on sbcl 1.3.5.24

See https://github.com/bon/inlined-generic-function/commit/8b6e4d5b10cace47de4343e6dde8455f21dfd579

So my question is whether this indicates that inlined-generic-functions only speed up on built-in types and not on defined classes?

Sep 01 '16 14:09 bon

it seems normal-plus is running w/o boxing, right?

Sep 01 '16 17:09 guicho271828

Correct! Fixed in https://github.com/bon/inlined-generic-function/commit/76d1eb6e77ebc5433465b9afb2cdb84b6c4c3e4d

Processor cycles are now

Sep 01 '16 17:09 bon

phew.

Sep 01 '16 20:09 guicho271828

I just tested your version. On my machine, the result is still in favor of the inlined version.

Evaluation took:
  0.001 seconds of real time
  0.004000 seconds of total run time (0.004000 user, 0.000000 system)
  400.00% CPU
  638,640 processor cycles
  131,024 bytes consed

Evaluation took:
  0.000 seconds of real time
  0.000000 seconds of total run time (0.000000 user, 0.000000 system)
  100.00% CPU
  608,634 processor cycles
  163,808 bytes consed

Evaluation took:
  0.003 seconds of real time
  0.000000 seconds of total run time (0.000000 user, 0.000000 system)
  0.00% CPU
  4,543,020 processor cycles
  655,184 bytes consed

Evaluation took:
  0.000 seconds of real time
  0.000000 seconds of total run time (0.000000 user, 0.000000 system)
  100.00% CPU
  389,169 processor cycles
  163,808 bytes consed

What is this difference? In your result I-g-function is performing better, but not much better. I use SBCL 1.3.8 on roswell on

$ uname -a
Linux guicho-x61 4.4.0-36-generic #55-Ubuntu SMP Thu Aug 11 18:01:55 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
$ cat /proc/cpuinfo
...
model name  : Intel(R) Core(TM)2 Duo CPU     T7100  @ 1.80GHz
...

Sep 02 '16 22:09 guicho271828

For me the numbers of cycles vary wildly from run to run. Sometimes the igf gets a little quicker, sometimes slower. One example is shown below.

But the more interesting question is why the igf showed a 10x speedup on numbers but hardly any difference on defined classes? Of course I would be very happy to see a 10x speedup on defined classes too!

$ cat /proc/cpuinfo  | ag 'model name' | head -1
model name  : Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz
$ uname -a
Linux tie 4.7.2-1-ARCH #1 SMP PREEMPT Sat Aug 20 23:02:56 CEST 2016 x86_64 GNU/Linux
$ ros use sbcl
$ ~/.roswell/impls/x86-64/linux/sbcl/1.3.9/bin/sbcl --version
SBCL 1.3.9
$ ros run
$ rlwrap ros run
* (ql:quickload :inlined-generic-function)

...

* (load "benchmark.lisp")

...

Evaluation took:
  0.000 seconds of real time
  0.000000 seconds of total run time (0.000000 user, 0.000000 system)
  100.00% CPU
  424,334 processor cycles
  131,024 bytes consed

Evaluation took:
  0.000 seconds of real time
  0.000000 seconds of total run time (0.000000 user, 0.000000 system)
  100.00% CPU
  362,358 processor cycles
  163,792 bytes consed

Evaluation took:
  0.001 seconds of real time
  0.000000 seconds of total run time (0.000000 user, 0.000000 system)
  0.00% CPU
  2,060,160 processor cycles
  655,200 bytes consed

Evaluation took:
  0.000 seconds of real time
  0.003333 seconds of total run time (0.003333 user, 0.000000 system)
  100.00% CPU
  493,287 processor cycles
  163,792 bytes consed

Sep 03 '16 15:09 bon

the reason of not achieving 10x speedup is due to the type information and the cost of slot access.

The contents slot of box is not typed, so the (+ (contents a) b) part is always calling a generic-+, not the optimized machine assembly. You should check the disassembly result.
The accessor contents is a normal generic function. So the slot access is slow.

Imagine the total cost is 10X for normal GF and X for IGF. Above two factor adds two overheads, resulting in 10X+A+B vs X+A+B. Then obviously 10 times speedup is not achievable since A+B could be very large.

Sep 03 '16 19:09 guicho271828

I updated the environment and noticed that the examples in playground.lisp getting slow. It looks like the function is prevented from inlining.

Sep 16 '16 21:09 guicho271828

(push :inline-generic-function *features*) still successfully forces the functions being inlined, but I don't like this solution...

Sep 16 '16 21:09 guicho271828

inlined-generic-function inlined-generic-function copied to clipboard

Benchmark behaviour on defined class is slow.

inlined-generic-function
inlined-generic-function copied to clipboard