Address the top N slow tests
After the speedups of #194, some tests emerged as much slower than the rest. Here are the top ten slowest tests:
t/01-fn-rand.t .................. ok 48915 ms ( 0.00 usr 0.00 sys + 48.45 cusr 0.09 csys = 48.55 CPU)
t/02-fn-randlen.t ............... ok 25045 ms ( 0.00 usr 0.02 sys + 24.98 cusr 0.03 csys = 25.03 CPU)
t/fn-array.t .................... ok 19410 ms ( 0.00 usr 0.00 sys + 19.26 cusr 0.08 csys = 19.34 CPU)
t/01-virfns.t ................... ok 16822 ms ( 0.00 usr 0.00 sys + 16.73 cusr 0.08 csys = 16.81 CPU)
t/fn-cut.t ...................... ok 13710 ms ( 0.00 usr 0.00 sys + 13.56 cusr 0.05 csys = 13.61 CPU)
t/01-mac-nof.t .................. ok 12665 ms ( 0.00 usr 0.00 sys + 12.62 cusr 0.08 csys = 12.70 CPU)
t/01-mac-swap.t ................. ok 11991 ms ( 0.00 usr 0.00 sys + 11.94 cusr 0.06 csys = 12.00 CPU)
t/02-mac-accum.t ................ ok 10851 ms ( 0.00 usr 0.00 sys + 10.81 cusr 0.05 csys = 10.86 CPU)
t/fn-hat-w.t .................... ok 10644 ms ( 0.00 usr 0.00 sys + 10.55 cusr 0.06 csys = 10.61 CPU)
t/01-fn-mod.t ................... ok 8362 ms ( 0.00 usr 0.00 sys + 8.31 cusr 0.09 csys = 8.40 CPU)
t/02-prims.t .................... ok 8348 ms ( 0.00 usr 0.00 sys + 8.28 cusr 0.08 csys = 8.36 CPU)
Let's sort these into (estimated) root causes. This should help guide optimization efforts.
Slow numbers
Implementing fast numbers is tracked in #140.
t/01-fn-rand.t .................. ok 48915 ms ( 0.00 usr 0.00 sys + 48.45 cusr 0.09 csys = 48.55 CPU)
t/02-fn-randlen.t ............... ok 25045 ms ( 0.00 usr 0.02 sys + 24.98 cusr 0.03 csys = 25.03 CPU)
t/fn-hat-w.t .................... ok 10644 ms ( 0.00 usr 0.00 sys + 10.55 cusr 0.06 csys = 10.61 CPU)
t/01-fn-mod.t ................... ok 8362 ms ( 0.00 usr 0.00 sys + 8.31 cusr 0.09 csys = 8.40 CPU)
t/01-fn-rand.t calls rand and <= repeatedly. I split them up, and the time taken by each is 44 s and 2 s respectively. So speeding up rand for numbers would help immensely.
Slow arrays
We don't currently have a separate issue to track making fast arrays. We need to implement fast numbers first, though.
t/fn-array.t .................... ok 19410 ms ( 0.00 usr 0.00 sys + 19.26 cusr 0.08 csys = 19.34 CPU)
t/01-virfns.t ................... ok 16822 ms ( 0.00 usr 0.00 sys + 16.73 cusr 0.08 csys = 16.81 CPU)
Slow cons lists
Issue #144 addresses this for the special case of strings; doing it for all lists might be a good idea. There's now a dedicated issue: #221.
t/fn-cut.t ...................... ok 13710 ms ( 0.00 usr 0.00 sys + 13.56 cusr 0.05 csys = 13.61 CPU)
t/01-mac-nof.t .................. ok 12665 ms ( 0.00 usr 0.00 sys + 12.62 cusr 0.08 csys = 12.70 CPU)
t/02-prims.t .................... ok 8348 ms ( 0.00 usr 0.00 sys + 8.28 cusr 0.08 csys = 8.36 CPU)
That second one there is tricky. We can only speed it up if we can prove that the second argument does not have side effects:
> (set n 0)
0
> (++ n)
1
> (nof 5 (++ n))
(2 3 4 5 6)
> (nof 5 (prn "hi"))
"hi"
"hi"
"hi"
"hi"
"hi"
("hi" "hi" "hi" "hi" "hi")
Speeding up mem and rev would help speed up the last one, 02-prims.t.
Hard-to-speed-up macros
I'm not sure offhand how to speed these up. At some time I believe we'll be able to compile (and specialize) macros; that will probably help.
t/01-mac-swap.t ................. ok 11991 ms ( 0.00 usr 0.00 sys + 11.94 cusr 0.06 csys = 12.00 CPU)
t/02-mac-accum.t ................ ok 10851 ms ( 0.00 usr 0.00 sys + 10.81 cusr 0.05 csys = 10.86 CPU)
Bit of a fuzzy issue as far as close-ability is concerned. But here's the thing: I think four seconds is quite long for any of these .t files to run; so getting all these tests under that would be a good-enough trigger to close this issue.
Just adding the latest four new contenders to the prize:
t/02-prim-rdb.t ................. ok 14377 ms ( 0.00 usr 0.00 sys + 14.30 cusr 0.09 csys = 14.39 CPU)
t/fn-bel-sigerr.t ............... ok 19717 ms ( 0.00 usr 0.00 sys + 19.59 cusr 0.14 csys = 19.73 CPU)
t/01-fn-bel.t ................... ok 81282 ms ( 0.00 usr 0.00 sys + 81.17 cusr 0.12 csys = 81.30 CPU)
t/01-fn-bel-ccc.t ............... ok 91078 ms ( 0.00 usr 0.00 sys + 91.05 cusr 0.05 csys = 91.09 CPU)
Three of these have to do with the slowness of evaluating code in an evaluator running inside an evaluator. 02-prim-rdb.t is slow because nof is slow.
Stating the obvious, we have a number of new slow tests that merit inclusion on this list:
[20:05:04] t/fn-readall.t .................. ok 53877 ms
[19:51:15] t/fn-readas.t ................... ok 51826 ms
[20:07:00] t/fn-read.t ..................... ok 50769 ms
[19:52:42] t/fn-rdex.t ..................... ok 36564 ms
[20:02:50] t/example-bel.t ................. ok 28202 ms
[20:02:13] t/fn-c-star.t ................... ok 17635 ms
[19:50:41] t/fn-rdlist.t ................... ok 16131 ms
[19:52:22] t/fn-rdtarget.t ................. ok 16003 ms
[19:56:11] t/fn-rddot.t .................... ok 15968 ms
[20:06:42] t/fn-parseword.t ................ ok 15088 ms
[19:55:44] t/fn-bel-prim-id.t .............. ok 12818 ms
[19:59:20] t/02-prim-wrb.t ................. ok 11182 ms
[19:52:10] t/fn-bel-apply.t ................ ok 9668 ms
[19:51:00] t/fn-bel-applyclo.t ............. ok 8198 ms
Skimming for causes:
- Parts of the reader (could probably be sped up by fastfuncs)
- Parts of the Bel evaluator
- The
example-belfile (a couple of different things are slow in there)
My new favorite slow code (which should probably be turned into a test):
(set N 13)
(set indent "")
(for row 1 N
(if indent (pr indent))
(for col 1 N
(pr ". "))
(pr \lf)
(push \sp indent))
Here on this machine, it prints a board in little over 186 seconds (which is too long for entries into the Eurovision Song Contest):
$ time perl -Ilib bin/bel print-hex-board.bel
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
real 3m6.202s
user 3m6.000s
sys 0m0.093s
Wanting to know where the slowness comes from, I made two more versions: one with only the for loop structure and no printing, and one with only the printing and no looping:
$ time perl -Ilib bin/bel print-hex-board-only-for-loop.bel
real 3m4.480s
user 3m4.296s
sys 0m0.156s
$ time perl -Ilib bin/bel print-hex-board-only-printing.bel
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
real 0m0.451s
user 0m0.296s
sys 0m0.109s
So yeah, it's the looping that does it. Loops haven't been identified as a bottleneck above (except maybe for nof), but I suspect they are. More to the point, it's probably numbers that are still slow.