bel icon indicating copy to clipboard operation
bel copied to clipboard

Address the top N slow tests

Open masak opened this issue 5 years ago • 4 comments

After the speedups of #194, some tests emerged as much slower than the rest. Here are the top ten slowest tests:

t/01-fn-rand.t .................. ok    48915 ms ( 0.00 usr  0.00 sys + 48.45 cusr  0.09 csys = 48.55 CPU)
t/02-fn-randlen.t ............... ok    25045 ms ( 0.00 usr  0.02 sys + 24.98 cusr  0.03 csys = 25.03 CPU)
t/fn-array.t .................... ok    19410 ms ( 0.00 usr  0.00 sys + 19.26 cusr  0.08 csys = 19.34 CPU)
t/01-virfns.t ................... ok    16822 ms ( 0.00 usr  0.00 sys + 16.73 cusr  0.08 csys = 16.81 CPU)
t/fn-cut.t ...................... ok    13710 ms ( 0.00 usr  0.00 sys + 13.56 cusr  0.05 csys = 13.61 CPU)
t/01-mac-nof.t .................. ok    12665 ms ( 0.00 usr  0.00 sys + 12.62 cusr  0.08 csys = 12.70 CPU)
t/01-mac-swap.t ................. ok    11991 ms ( 0.00 usr  0.00 sys + 11.94 cusr  0.06 csys = 12.00 CPU)
t/02-mac-accum.t ................ ok    10851 ms ( 0.00 usr  0.00 sys + 10.81 cusr  0.05 csys = 10.86 CPU)
t/fn-hat-w.t .................... ok    10644 ms ( 0.00 usr  0.00 sys + 10.55 cusr  0.06 csys = 10.61 CPU)
t/01-fn-mod.t ................... ok     8362 ms ( 0.00 usr  0.00 sys +  8.31 cusr  0.09 csys =  8.40 CPU)
t/02-prims.t .................... ok     8348 ms ( 0.00 usr  0.00 sys +  8.28 cusr  0.08 csys =  8.36 CPU)

Let's sort these into (estimated) root causes. This should help guide optimization efforts.

Slow numbers

Implementing fast numbers is tracked in #140.

t/01-fn-rand.t .................. ok    48915 ms ( 0.00 usr  0.00 sys + 48.45 cusr  0.09 csys = 48.55 CPU)
t/02-fn-randlen.t ............... ok    25045 ms ( 0.00 usr  0.02 sys + 24.98 cusr  0.03 csys = 25.03 CPU)
t/fn-hat-w.t .................... ok    10644 ms ( 0.00 usr  0.00 sys + 10.55 cusr  0.06 csys = 10.61 CPU)
t/01-fn-mod.t ................... ok     8362 ms ( 0.00 usr  0.00 sys +  8.31 cusr  0.09 csys =  8.40 CPU)

t/01-fn-rand.t calls rand and <= repeatedly. I split them up, and the time taken by each is 44 s and 2 s respectively. So speeding up rand for numbers would help immensely.

Slow arrays

We don't currently have a separate issue to track making fast arrays. We need to implement fast numbers first, though.

t/fn-array.t .................... ok    19410 ms ( 0.00 usr  0.00 sys + 19.26 cusr  0.08 csys = 19.34 CPU)
t/01-virfns.t ................... ok    16822 ms ( 0.00 usr  0.00 sys + 16.73 cusr  0.08 csys = 16.81 CPU)

Slow cons lists

Issue #144 addresses this for the special case of strings; doing it for all lists might be a good idea. There's now a dedicated issue: #221.

t/fn-cut.t ...................... ok    13710 ms ( 0.00 usr  0.00 sys + 13.56 cusr  0.05 csys = 13.61 CPU)
t/01-mac-nof.t .................. ok    12665 ms ( 0.00 usr  0.00 sys + 12.62 cusr  0.08 csys = 12.70 CPU)
t/02-prims.t .................... ok     8348 ms ( 0.00 usr  0.00 sys +  8.28 cusr  0.08 csys =  8.36 CPU)

That second one there is tricky. We can only speed it up if we can prove that the second argument does not have side effects:

> (set n 0)
0
> (++ n)
1
> (nof 5 (++ n))
(2 3 4 5 6)
> (nof 5 (prn "hi"))
"hi"
"hi"
"hi"
"hi"
"hi"
("hi" "hi" "hi" "hi" "hi")

Speeding up mem and rev would help speed up the last one, 02-prims.t.

Hard-to-speed-up macros

I'm not sure offhand how to speed these up. At some time I believe we'll be able to compile (and specialize) macros; that will probably help.

t/01-mac-swap.t ................. ok    11991 ms ( 0.00 usr  0.00 sys + 11.94 cusr  0.06 csys = 12.00 CPU)
t/02-mac-accum.t ................ ok    10851 ms ( 0.00 usr  0.00 sys + 10.81 cusr  0.05 csys = 10.86 CPU)

masak avatar Jul 11 '20 09:07 masak

Bit of a fuzzy issue as far as close-ability is concerned. But here's the thing: I think four seconds is quite long for any of these .t files to run; so getting all these tests under that would be a good-enough trigger to close this issue.

masak avatar Jul 11 '20 09:07 masak

Just adding the latest four new contenders to the prize:

t/02-prim-rdb.t ................. ok    14377 ms ( 0.00 usr  0.00 sys + 14.30 cusr  0.09 csys = 14.39 CPU)
t/fn-bel-sigerr.t ............... ok    19717 ms ( 0.00 usr  0.00 sys + 19.59 cusr  0.14 csys = 19.73 CPU)
t/01-fn-bel.t ................... ok    81282 ms ( 0.00 usr  0.00 sys + 81.17 cusr  0.12 csys = 81.30 CPU)
t/01-fn-bel-ccc.t ............... ok    91078 ms ( 0.00 usr  0.00 sys + 91.05 cusr  0.05 csys = 91.09 CPU)

Three of these have to do with the slowness of evaluating code in an evaluator running inside an evaluator. 02-prim-rdb.t is slow because nof is slow.

masak avatar Sep 13 '20 14:09 masak

Stating the obvious, we have a number of new slow tests that merit inclusion on this list:

[20:05:04] t/fn-readall.t .................. ok    53877 ms
[19:51:15] t/fn-readas.t ................... ok    51826 ms
[20:07:00] t/fn-read.t ..................... ok    50769 ms
[19:52:42] t/fn-rdex.t ..................... ok    36564 ms
[20:02:50] t/example-bel.t ................. ok    28202 ms
[20:02:13] t/fn-c-star.t ................... ok    17635 ms
[19:50:41] t/fn-rdlist.t ................... ok    16131 ms
[19:52:22] t/fn-rdtarget.t ................. ok    16003 ms
[19:56:11] t/fn-rddot.t .................... ok    15968 ms
[20:06:42] t/fn-parseword.t ................ ok    15088 ms
[19:55:44] t/fn-bel-prim-id.t .............. ok    12818 ms
[19:59:20] t/02-prim-wrb.t ................. ok    11182 ms
[19:52:10] t/fn-bel-apply.t ................ ok     9668 ms
[19:51:00] t/fn-bel-applyclo.t ............. ok     8198 ms

Skimming for causes:

  • Parts of the reader (could probably be sped up by fastfuncs)
  • Parts of the Bel evaluator
  • The example-bel file (a couple of different things are slow in there)

masak avatar Oct 28 '20 13:10 masak

My new favorite slow code (which should probably be turned into a test):

(set N 13)

(set indent "")

(for row 1 N
  (if indent (pr indent))
  (for col 1 N
    (pr ". "))
  (pr \lf)
  (push \sp indent))

Here on this machine, it prints a board in little over 186 seconds (which is too long for entries into the Eurovision Song Contest):

$ time perl -Ilib bin/bel print-hex-board.bel
. . . . . . . . . . . . .
 . . . . . . . . . . . . .
  . . . . . . . . . . . . .
   . . . . . . . . . . . . .
    . . . . . . . . . . . . .
     . . . . . . . . . . . . .
      . . . . . . . . . . . . .
       . . . . . . . . . . . . .
        . . . . . . . . . . . . .
         . . . . . . . . . . . . .
          . . . . . . . . . . . . .
           . . . . . . . . . . . . .
            . . . . . . . . . . . . .

real    3m6.202s
user    3m6.000s
sys     0m0.093s

Wanting to know where the slowness comes from, I made two more versions: one with only the for loop structure and no printing, and one with only the printing and no looping:

$ time perl -Ilib bin/bel print-hex-board-only-for-loop.bel

real    3m4.480s
user    3m4.296s
sys     0m0.156s

$ time perl -Ilib bin/bel print-hex-board-only-printing.bel
. . . . . . . . . . . . .
 . . . . . . . . . . . . .
  . . . . . . . . . . . . .
   . . . . . . . . . . . . .
    . . . . . . . . . . . . .
     . . . . . . . . . . . . .
      . . . . . . . . . . . . .
       . . . . . . . . . . . . .
        . . . . . . . . . . . . .
         . . . . . . . . . . . . .
          . . . . . . . . . . . . .
           . . . . . . . . . . . . .
            . . . . . . . . . . . . .

real    0m0.451s
user    0m0.296s
sys     0m0.109s

So yeah, it's the looping that does it. Loops haven't been identified as a bottleneck above (except maybe for nof), but I suspect they are. More to the point, it's probably numbers that are still slow.

masak avatar Jun 24 '22 09:06 masak