tinygo icon indicating copy to clipboard operation
tinygo copied to clipboard

runtime (gc_blocks.go): make sweep branchless

Open niaow opened this issue 1 month ago • 6 comments

Instead of looping over each block, we can use bit hacks to operate on an entire state byte. This deinterleaves the state bits in order to enable these tricks.

Performance in the problematic go/format benchmark:

                    │ conservative.txt │    conservative-branchless.txt     │              boehm.txt              │
                    │      sec/op      │   sec/op     vs base               │   sec/op     vs base                │
Format/array1-10000        30.46m ± 2%   28.93m ± 2%  -5.01% (p=0.004 n=20)   22.13m ± 5%  -27.35% (p=0.000 n=20)

                    │ conservative.txt │     conservative-branchless.txt     │              boehm.txt               │
                    │       B/s        │     B/s       vs base               │     B/s       vs base                │
Format/array1-10000       2.027Mi ± 2%   2.136Mi ± 3%  +5.41% (p=0.004 n=20)   2.789Mi ± 5%  +37.65% (p=0.000 n=20)

                    │ conservative.txt │  conservative-branchless.txt   │              boehm.txt               │
                    │       B/op       │     B/op      vs base          │     B/op      vs base                │
Format/array1-10000       4.663Mi ± 0%   4.663Mi ± 0%  ~ (p=1.000 n=20)   6.979Mi ± 0%  +49.68% (p=0.000 n=20)

                    │ conservative.txt │  conservative-branchless.txt  │             boehm.txt              │
                    │    allocs/op     │  allocs/op   vs base          │ allocs/op  vs base                 │
Format/array1-10000        204.3k ± 0%   204.3k ± 0%  ~ (p=1.000 n=20)   0.0k ± 0%  -100.00% (p=0.000 n=20)

niaow avatar Nov 30 '25 20:11 niaow

We could probably squeeze more performance out of this by making the state masks bigger, but we would need popcnt on the target machine for that to really work.

niaow avatar Nov 30 '25 20:11 niaow

Please resolve merge conflicts now @niaow since #5102 was merged.

deadprogram avatar Dec 05 '25 10:12 deadprogram

The AVR tests are broken due to an unrelated issue I will need to fix in a separate PR first #5111

niaow avatar Dec 05 '25 17:12 niaow

@niaow can you please rebase this branch now?

deadprogram avatar Dec 05 '25 20:12 deadprogram

I'll try to review this today.

dgryski avatar Dec 09 '25 22:12 dgryski

This PR is once again ready for rebase @niaow

deadprogram avatar Dec 10 '25 07:12 deadprogram

I am going to move the counters in here now that #5105 is merged. I was going to do that in a separate PR but I think it is simpler to just do it here.

niaow avatar Dec 10 '25 17:12 niaow

Finished moving the counters. Updated perf numbers:

                    │ conservative.txt │        conservative-new.txt         │              boehm.txt              │
                    │      sec/op      │   sec/op     vs base                │   sec/op     vs base                │
Format/array1-10000        23.79m ± 1%   21.36m ± 3%  -10.21% (p=0.000 n=20)   20.80m ± 2%  -12.57% (p=0.000 n=20)

                    │ conservative.txt │         conservative-new.txt         │              boehm.txt               │
                    │       B/s        │     B/s       vs base                │     B/s       vs base                │
Format/array1-10000       2.594Mi ± 1%   2.890Mi ± 4%  +11.40% (p=0.000 n=20)   2.971Mi ± 2%  +14.52% (p=0.000 n=20)

niaow avatar Dec 10 '25 18:12 niaow

Finished moving the counters. Updated perf numbers:

                    │ conservative.txt │        conservative-new.txt         │              boehm.txt              │
                    │      sec/op      │   sec/op     vs base                │   sec/op     vs base                │
Format/array1-10000        23.79m ± 1%   21.36m ± 3%  -10.21% (p=0.000 n=20)   20.80m ± 2%  -12.57% (p=0.000 n=20)

                    │ conservative.txt │         conservative-new.txt         │              boehm.txt               │
                    │       B/s        │     B/s       vs base                │     B/s       vs base                │
Format/array1-10000       2.594Mi ± 1%   2.890Mi ± 4%  +11.40% (p=0.000 n=20)   2.971Mi ± 2%  +14.52% (p=0.000 n=20)

This is looking good!

@dgryski waiting on your feedback...

deadprogram avatar Dec 10 '25 19:12 deadprogram

I tried to make this a bit clearer, see if this looks okay.

niaow avatar Dec 10 '25 22:12 niaow