Avalanche Port to zig 0.14.1

Modernized the entire codebase including build.zig and Makefile, engine is now a little faster due the more modern compiler.

Jun 06 '25 07:06 JonathanHallstrom

Theres also two small changes, i removed the arena allocator and instead use the c allocator directly in the UCI code so the lines are now properly freed. I also changed the stack size for the search thread to 256MB as the previous limit of 64MB was causing crashes when built with the latest compiler.

Jun 06 '25 07:06 JonathanHallstrom

Hey, thanks again for the effort. I tried compiling with your makefile using zig 0.14.1. I am seeing identical outputs as the old version (great!), but the new one is quite a lot slower. I'm not sure what I am doing wrong so I am attaching what I did. Is this what you used to compile?

snowballsh@SnowballSH:~/code/Avalanche$ zig version
0.14.1
snowballsh@SnowballSH:~/code/Avalanche$ make
zig build --release=fast -Dtarget-name="Avalanche"
snowballsh@SnowballSH:~/code/Avalanche$ ./zig-out/bin/Avalanche 
Avalanche Compiled at 2025-06-16-04:04 by Yinuo Huang (SnowballSH)
go depth 10
info depth 1 seldepth 1 nodes 24 time 14 score cp 37 pv d2d4
info string thread 0 nodes 83
info depth 2 seldepth 2 nodes 83 time 56 score cp 29 pv d2d4 g8f6
info string thread 0 nodes 184
info depth 3 seldepth 3 nodes 184 time 93 score cp 35 pv d2d4 g8f6 g1f3
info string thread 0 nodes 363
info depth 4 seldepth 4 nodes 363 time 153 score cp 29 pv d2d4 g8f6 g1f3 d7d5
info string thread 0 nodes 1819
info depth 5 seldepth 7 nodes 1819 time 693 score cp 35 pv e2e4 e7e5 d2d4 e5d4
info string thread 0 nodes 3740
info depth 6 seldepth 7 nodes 3740 time 1376 score cp 26 pv d2d4 d7d5 g1f3 g8f6 c1f4 c8f5
info string thread 0 nodes 4419
info depth 7 seldepth 7 nodes 4419 time 1627 score cp 26 pv d2d4 d7d5 g1f3 g8f6 c1f4 c8f5
info string thread 0 nodes 10924
info depth 8 seldepth 11 nodes 10924 time 4687 score cp 34 pv d2d4 d7d5 g1f3 g8f6 c1f4 c8f5
info string thread 0 nodes 21382
info depth 9 seldepth 12 nodes 21382 time 9334 score cp 39 pv e2e4 e7e5 g1f3 b8c6 d2d4 e5d4 f3d4 g8f6 d4c6 b7c6
info string thread 0 nodes 36029
info depth 10 seldepth 15 nodes 36029 time 18745 score cp 35 pv e2e4 e7e5 g1f3 b8c6 d2d4 e5d4 f3d4 g8f6 d4c6 b7c6 b1c3 f8c5
bestmove e2e4

For comparison, the old binary outputs

info depth 10 seldepth 15 nodes 36029 time 25 score cp 35 pv e2e4 e7e5 g1f3 b8c6 d2d4 e5d4 f3d4 g8f6 d4c6 b7c6 b1c3 f8c5

so the new binary seems 750x slower. Am I missing something?

Thanks.

Jun 16 '25 04:06 SnowballSH

thats very strange, ill investigate it

Jun 16 '25 17:06 JonathanHallstrom

yeah its a performance regression where some array copies are no longer elided, the workaround is to take a reference to arrays before accessing them, im applying the changes through the codebase

Jun 16 '25 18:06 JonathanHallstrom

im working on getting the speed back up now

Jun 16 '25 18:06 JonathanHallstrom

its unfortunately still not there, almost 2x slower still edit: i dont know why its still slower:( not sure how it was faster before edit2: well its probably more copy elision regressions, i think making maybe making a lot of params noalias pointers will help

Jun 16 '25 18:06 JonathanHallstrom

Thanks. I see, so the issue is that most of the functions use copy instead of pointers? That sounds plausible. I remember having everything pointers a long time ago but copies turned out to be faster... Perhaps the new compiler does things differently. I can trying making more params pointers some time.

Jun 17 '25 01:06 SnowballSH

I can confirm that after your fixes the engine is now much much faster, only 2x slower than the old compiler as you mentioned.

Jun 17 '25 02:06 SnowballSH

Yeah, the biggest issue was the entire rook and bishop attack arrays being copied for every single call to get the magics.

Jun 17 '25 16:06 JonathanHallstrom