tinygo runtime (gc_blocks.go): make sweep branchless

Instead of looping over each block, we can use bit hacks to operate on an entire state byte. This deinterleaves the state bits in order to enable these tricks.

Performance in the problematic go/format benchmark:

                    │ conservative.txt │    conservative-branchless.txt     │              boehm.txt              │
                    │      sec/op      │   sec/op     vs base               │   sec/op     vs base                │
Format/array1-10000        30.46m ± 2%   28.93m ± 2%  -5.01% (p=0.004 n=20)   22.13m ± 5%  -27.35% (p=0.000 n=20)

                    │ conservative.txt │     conservative-branchless.txt     │              boehm.txt               │
                    │       B/s        │     B/s       vs base               │     B/s       vs base                │
Format/array1-10000       2.027Mi ± 2%   2.136Mi ± 3%  +5.41% (p=0.004 n=20)   2.789Mi ± 5%  +37.65% (p=0.000 n=20)

                    │ conservative.txt │  conservative-branchless.txt   │              boehm.txt               │
                    │       B/op       │     B/op      vs base          │     B/op      vs base                │
Format/array1-10000       4.663Mi ± 0%   4.663Mi ± 0%  ~ (p=1.000 n=20)   6.979Mi ± 0%  +49.68% (p=0.000 n=20)

                    │ conservative.txt │  conservative-branchless.txt  │             boehm.txt              │
                    │    allocs/op     │  allocs/op   vs base          │ allocs/op  vs base                 │
Format/array1-10000        204.3k ± 0%   204.3k ± 0%  ~ (p=1.000 n=20)   0.0k ± 0%  -100.00% (p=0.000 n=20)

Nov 30 '25 20:11 niaow

We could probably squeeze more performance out of this by making the state masks bigger, but we would need popcnt on the target machine for that to really work.

Nov 30 '25 20:11 niaow

Please resolve merge conflicts now @niaow since #5102 was merged.

Dec 05 '25 10:12 deadprogram

The AVR tests are broken due to an unrelated issue I will need to fix in a separate PR first #5111

Dec 05 '25 17:12 niaow

@niaow can you please rebase this branch now?

Dec 05 '25 20:12 deadprogram

I'll try to review this today.

Dec 09 '25 22:12 dgryski

This PR is once again ready for rebase @niaow

Dec 10 '25 07:12 deadprogram

I am going to move the counters in here now that #5105 is merged. I was going to do that in a separate PR but I think it is simpler to just do it here.

Dec 10 '25 17:12 niaow

Finished moving the counters. Updated perf numbers:

                    │ conservative.txt │        conservative-new.txt         │              boehm.txt              │
                    │      sec/op      │   sec/op     vs base                │   sec/op     vs base                │
Format/array1-10000        23.79m ± 1%   21.36m ± 3%  -10.21% (p=0.000 n=20)   20.80m ± 2%  -12.57% (p=0.000 n=20)

                    │ conservative.txt │         conservative-new.txt         │              boehm.txt               │
                    │       B/s        │     B/s       vs base                │     B/s       vs base                │
Format/array1-10000       2.594Mi ± 1%   2.890Mi ± 4%  +11.40% (p=0.000 n=20)   2.971Mi ± 2%  +14.52% (p=0.000 n=20)

Dec 10 '25 18:12 niaow

Finished moving the counters. Updated perf numbers:

                    │ conservative.txt │        conservative-new.txt         │              boehm.txt              │
                    │      sec/op      │   sec/op     vs base                │   sec/op     vs base                │
Format/array1-10000        23.79m ± 1%   21.36m ± 3%  -10.21% (p=0.000 n=20)   20.80m ± 2%  -12.57% (p=0.000 n=20)

                    │ conservative.txt │         conservative-new.txt         │              boehm.txt               │
                    │       B/s        │     B/s       vs base                │     B/s       vs base                │
Format/array1-10000       2.594Mi ± 1%   2.890Mi ± 4%  +11.40% (p=0.000 n=20)   2.971Mi ± 2%  +14.52% (p=0.000 n=20)

This is looking good!

@dgryski waiting on your feedback...

Dec 10 '25 19:12 deadprogram

I tried to make this a bit clearer, see if this looks okay.

Dec 10 '25 22:12 niaow