fishtest
fishtest copied to clipboard
cutechess-cli for raspberry pi (request for help).
Occasionally there are requests for running the worker on a RPI. To make this possible we need a cutechess-cli binary for the RPI. Perhaps someone who is familiar with the RPI architecture can look into this?
Instructions for cross-compiling seem to be most desirable. In that way the binary can be produced by developers that do not own a RPI.
I once tried a native compile, but gave up since the build time was very large.
I can install it by running sudo apt-get install cutechess
Are raspberry pi's even fast enough for fishtest min nps?
@vondele I browsed around on the RPI repository. At first sight this seems to be the source code
http://sourcearchive.raspbian.org/main/c/cutechess/
It is from 2013. This is not recent enough.
yeah, it is probably pretty old:
$ cutechess-cli --version
cutechess-cli 0.4.2
Using Qt version 4.8.7
I tried running fishtest on my pi 4 with a natively built cutechess-cli. It still doesn't work, as the makefile defaults on x86 instead of using aarch64 (?)
Available Makefile architecture targets: ['x86-64-vnni512', 'x86-64-vnni256', 'x86-64-avx512', 'x86-64-avxvnni', 'x86-64-bmi2', 'x86-64-avx2', 'x86-64-sse41-popcnt', 'x86-64-modern', 'x86-64-ssse3', 'x86-64-sse3-popcnt', 'x86-64', 'x86-32-sse41-popcnt', 'x86-32-sse2', 'x86-32', 'ppc-64', 'ppc-32', 'armv7', 'armv7-neon', 'armv8', 'e2k', 'apple-silicon', 'general-64', 'general-32']
Available g++/cpu properties: {'flags': ['neon', 'outline-atomics'], 'arch': 'generic'}
Determined the best architecture to be x86-64
Default net: nn-ad9b42354671.nnue
Already available.
Config:
debug: 'no'
sanitize: 'none'
optimize: 'yes'
arch: 'x86_64'
bits: '64'
kernel: 'Linux'
os: 'GNU/Linux'
prefetch: 'yes'
popcnt: 'no'
pext: 'no'
sse: 'yes'
mmx: 'no'
sse2: 'yes'
ssse3: 'no'
sse41: 'no'
avx2: 'no'
avxvnni: 'no'
avx512: 'no'
vnni256: 'no'
vnni512: 'no'
neon: 'no'
arm_version: '0'
Flags:
CXX: clang++
CXXFLAGS: -DNNUE_EMBEDDING_OFF -Wall -Wcast-qual -fno-exceptions -std=c++17 -pedantic -Wextra -Wshadow -m64 -DUSE_PTHREADS -DNDEBUG -O3 -fexperimental-new-pass-manager -DIS_64BIT -msse -DUSE_SSE2 -msse2 -flto
LDFLAGS: -latomic -m64 -lpthread -DNNUE_EMBEDDING_OFF -Wall -Wcast-qual -fno-exceptions -std=c++17 -pedantic -Wextra -Wshadow -m64 -DUSE_PTHREADS -DNDEBUG -O3 -fexperimental-new-pass-manager -DIS_64BIT -msse -DUSE_SSE2 -msse2 -flto
Testing config sanity. If this fails, try 'make help' ...
Step 1/4. Building instrumented executable ...
make ARCH=x86-64 COMP=clang clang-profile-make
It might be helpful to post the output of
g++ -Q -march=native --help=target
and
clang++ -E - -march=native -###
Hey guys, I'm new around here so nice to meet you all!
I was trying to run fishtest on my Raspberry pi 4 and I had the same problems with cutechess-cli.
After reading your comments, I've built cutechess-cli on that raspberry pi and apparently it worked after moving the binary to fishtest/worker/testing
:)
The compressed binary is attached here if you want to try it out as well. It was built from the latest source code available and here's the output of cutechess-cli --version
cutechess-cli 1.3.0-beta2
Using Qt version 5.15.6
Running on Arch Linux ARM/arm
As @vdbergh suggested, here's the output of g++ -Q -march=native --help=target
on that raspberry pi
The following options are target specific:
-mabi= aapcs-linux
-mabort-on-noreturn [disabled]
-mandroid [disabled]
-mapcs [disabled]
-mapcs-frame [disabled]
-mapcs-reentrant [disabled]
-mapcs-stack-check [disabled]
-march= armv8-a+crc+simd
-marm [enabled]
-masm-syntax-unified [disabled]
-mbe32 [enabled]
-mbe8 [disabled]
-mbig-endian [disabled]
-mbionic [disabled]
-mbranch-cost= -1
-mcallee-super-interworking [disabled]
-mcaller-super-interworking [disabled]
-mcmse [disabled]
-mcpu=
-mfdpic [disabled]
-mfix-cmse-cve-2021-35465 [disabled]
-mfix-cortex-a57-aes-1742098 [disabled]
-mfix-cortex-a72-aes-1655431 -mfix-cortex-a57-aes-1742098
-mfix-cortex-m3-ldrd [disabled]
-mflip-thumb [disabled]
-mfloat-abi= hard
-mfp16-format= none
-mfpu= neon
-mgeneral-regs-only [disabled]
-mglibc [enabled]
-mhard-float -mfloat-abi=hard
-mlibarch= armv8-a+crc+simd
-mlittle-endian [enabled]
-mlong-calls [disabled]
-mmusl [disabled]
-mneon-for-64bits [disabled]
-mpic-data-is-text-relative [enabled]
-mpic-register=
-mpoke-function-name [disabled]
-mprint-tune-info [disabled]
-mpure-code [disabled]
-mrestrict-it [disabled]
-msched-prolog [enabled]
-msingle-pic-base [disabled]
-mslow-flash-data [disabled]
-msoft-float -mfloat-abi=soft
-mstack-protector-guard-offset=
-mstack-protector-guard= global
-mstructure-size-boundary= 8
-mthumb [disabled]
-mthumb-interwork [disabled]
-mtls-dialect= gnu
-mtp= cp15
-mtpcs-frame [disabled]
-mtpcs-leaf-frame [disabled]
-mtune=
-muclibc [disabled]
-munaligned-access [enabled]
-mvectorize-with-neon-double [disabled]
-mvectorize-with-neon-quad [enabled]
-mword-relocations [enabled]
Known ARM ABIs (for use with the -mabi= option):
aapcs aapcs-linux apcs-gnu atpcs iwmmxt
Known __fp16 formats (for use with the -mfp16-format= option):
alternative ieee none
Known ARM FPUs (for use with the -mfpu= option):
auto crypto-neon-fp-armv8 fp-armv8 fpv4-sp-d16 fpv5-d16 fpv5-sp-d16 neon neon-fp-armv8 neon-fp16 neon-vfpv3 neon-vfpv4 vfp vfp3 vfpv2 vfpv3 vfpv3-d16
vfpv3-d16-fp16 vfpv3-fp16 vfpv3xd vfpv3xd-fp16 vfpv4 vfpv4-d16
Valid arguments to -mtp=:
auto cp15 soft
Known floating-point ABIs (for use with the -mfloat-abi= option):
hard soft softfp
Valid arguments to -mstack-protector-guard=:
global tls
TLS dialect to use:
gnu gnu2
And the output of clang++ -E - -march=native -###
clang version 14.0.6
Target: armv7l-unknown-linux-gnueabihf
Thread model: posix
InstalledDir: /usr/bin
(in-process)
"/usr/bin/clang-14" "-cc1" "-triple" "armv8-unknown-linux-gnueabihf" "-E" "-disable-free" "-clear-ast-before-backend" "-disable-llvm-verifier" "-discard-value-names" "-main-file-name" "-" "-mrelocation-model" "pic" "-pic-level" "2" "-pic-is-pie" "-mframe-pointer=all" "-fmath-errno" "-ffp-contract=on" "-fno-rounding-math" "-mconstructor-aliases" "-target-cpu" "generic" "-target-feature" "+vfp2" "-target-feature" "+vfp2sp" "-target-feature" "+vfp3" "-target-feature" "+vfp3d16" "-target-feature" "+vfp3d16sp" "-target-feature" "+vfp3sp" "-target-feature" "+fp16" "-target-feature" "+vfp4" "-target-feature" "+vfp4d16" "-target-feature" "+vfp4d16sp" "-target-feature" "+vfp4sp" "-target-feature" "+fp-armv8" "-target-feature" "+fp-armv8d16" "-target-feature" "+fp-armv8d16sp" "-target-feature" "+fp-armv8sp" "-target-feature" "-fullfp16" "-target-feature" "+fp64" "-target-feature" "+d32" "-target-feature" "+neon" "-target-feature" "+sha2" "-target-feature" "+aes" "-target-feature" "-fp16fml" "-target-abi" "aapcs-linux" "-mfloat-abi" "hard" "-fallow-half-arguments-and-returns" "-debugger-tuning=gdb" "-fcoverage-compilation-dir=/home/mammoth/cutechess/build" "-resource-dir" "/usr/lib/clang/14.0.6" "-internal-isystem" "/usr/lib/clang/14.0.6/include" "-internal-isystem" "/usr/local/include" "-internal-isystem" "/usr/bin/../lib/gcc/armv7l-unknown-linux-gnueabihf/12.1.0/../../../../armv7l-unknown-linux-gnueabihf/include" "-internal-externc-isystem" "/include" "-internal-externc-isystem" "/usr/include" "-fdebug-compilation-dir=/home/mammoth/cutechess/build" "-ferror-limit" "19" "-stack-protector" "2" "-fno-signed-char" "-fgnuc-version=4.2.1" "-fcolor-diagnostics" "-faddrsig" "-o" "-" "-x" "c" "-"
Hope it helps! :)
Now I'm facing a different problem on my Raspberry Pi 4, the same one that @nimnananuk reported.
Apparently games.py
is not selecting the proper architecture, so the compiler breaks further on the execution. It's selecting x86-32
by default, here's a partial output after starting the worker (using cutechess-cli binary provided above)
Available Makefile architecture targets: ['x86-64-vnni512', 'x86-64-vnni256', 'x86-64-avx512', 'x86-64-avxvnni', 'x86-64-bmi2', 'x86-64-avx2', 'x86-64-sse41-popcnt', 'x86-64-modern', 'x86-64-ssse3', 'x86-64-sse3-popcnt', 'x86-64', 'x86-32-sse41-popcnt', 'x86-32-sse2', 'x86-32', 'ppc-64', 'ppc-32', 'armv7', 'armv7-neon', 'armv8', 'e2k', 'apple-silicon', 'general-64', 'general-32', 'riscv64']
Available g++/cpu properties: {'flags': ['arm', 'be32', 'glibc', 'little-endian', 'pic-data-is-text-relative', 'sched-prolog', 'unaligned-access', 'vectorize-with-neon-quad', 'word-relocations'], 'arch': 'armv8-a+crc+simd'}
Determined the best architecture to be x86-32
Default net: nn-ad9b42354671.nnue
I'll try to get it working and post updates here soon, meanwhile any tips or advice you might have is more than welcome!
I was able to run the worker using armv7-neon
architecture during the worker execution. The build process was a success (but quite slow), but apparently Raspberry Pi 4 is not powerful enough for fishtest :cry:
Exception running games:
This machine is too slow (189264.0 nps / thread) to run fishtest effectively - sorry!
Informing the server
Heartbeat stopped
Post request https://tests.stockfishchess.org:443/api/failed_task handled in 988.10ms (server: 2.36ms)
Task exited
Waiting for the heartbeat thread to finish...
Deleting lock file /home/mammoth/fishtest/worker/worker.lock
Edit: after overclocking the Raspberry Pi 4 to work on 2GHz speed (over 1.5GHz by default) there was a performance improvement but it was still not enough...
Exception running games:
This machine is too slow (238685.0 nps / thread) to run fishtest effectively - sorry!
Informing the server
Heartbeat stopped
Post request https://tests.stockfishchess.org:443/api/failed_task handled in 995.33ms (server: 2.00ms)
Task exited
Waiting for the heartbeat thread to finish...
Deleting lock file /home/mammoth/fishtest/worker/worker.lock
Hey guys, sorry for over-posting here, but just one final thought that occurred to me: running Raspberry Pi's Broadcom BCM2711 Cortex-A72 processor on 64-bits mode (armv8
), instead of the default 32-bit mode (armv7
), in which some people have seen some performance improvements.
Here's the locally compiled version of cutechess-cli for armv8
:
cutechess-cli 1.3.0-beta2
Using Qt version 5.15.6
Running on Arch Linux ARM/arm64
But still too slow for Fishtest :cry:
Here's the partial output of worker.py
when running on regular clock speed (1.5GHz):
Exception running games:
This machine is too slow (219427.0 nps / thread) to run fishtest effectively - sorry!
Informing the server
Heartbeat stopped
Post request https://tests.stockfishchess.org:443/api/failed_task handled in 1016.36ms (server: 13.88ms)
Task exited
Waiting for the heartbeat thread to finish...
Deleting lock file /home/mammoth/fishtest/worker/worker.lock
And here's the same output when running overclocked (2GHz):
Exception running games:
This machine is too slow (249853.0 nps / thread) to run fishtest effectively - sorry!
Informing the server
Post request https://tests.stockfishchess.org:443/api/failed_task handled in 880.70ms (server: 3.37ms)
Task exited
Waiting for the heartbeat thread to finish...
Heartbeat stopped
Deleting lock file /home/mammoth/fishtest/worker/worker.lock
Once again, hope it helps somehow!
Edit: one curious thing, there was a performance improvement when running with --concurrency 1 -m MAX
parameters (on previous executions, concurrency was set to 3)! Anyway, it was not enough still.
Exception running games:
This machine is too slow (370488.0 nps / thread) to run fishtest effectively - sorry!
Informing the server
Heartbeat stopped
Hi @ocaio thanks for your posts and for posting the cutechess-cli binary! It is a bit sad that the rPI seems to be too slow for Fishtest though (although it comes close). I don't have a rPI but I tried your binary in qemu-arm. It did not work as it depends on an arm version of Qt. Perhaps you can post this as well?
Sure, @vdbergh! I've installed the Qt5 dependencies (qt5-base
and qt5-svg
) from the official Arch Linux ARM package repository, and they are attached here as well (for aarch64
).
Do you need the 32-bit version too?