netatalk
netatalk copied to clipboard
netatalk2: Intermittent unit test failures
Describe the bug
make[4]: Entering directory '/home/arvid/src/aur/netatalk2/src/build/test/afpd'
PASS: test.sh
FAIL: test
============================================================================
Testsuite summary for netatalk 2.4.4
============================================================================
# TOTAL: 2
# PASS: 1
# SKIP: 0
# XFAIL: 0
# FAIL: 1
# XPASS: 0
# ERROR: 0
============================================================================
See test/afpd/test-suite.log for debugging.
============================================================================
The log mentioned is quite cryptic to me:
==============================================
netatalk 2.4.4: test/afpd/test-suite.log
==============================================
# TOTAL: 2
# PASS: 1
# SKIP: 0
# XFAIL: 0
# FAIL: 1
# XPASS: 0
# ERROR: 0
System information (uname -a): Linux 6.9.10-zen1-1-zen #1 ZEN SMP PREEMPT_DYNAMIC Thu, 18 Jul 2024 18:05:52 +0000 x86_64
Distribution information (/etc/os-release):
NAME="Arch Linux"
PRETTY_NAME="Arch Linux"
ID=arch
BUILD_ID=rolling
ANSI_COLOR="38;2;23;147;209"
HOME_URL="https://archlinux.org/"
DOCUMENTATION_URL="https://wiki.archlinux.org/"
SUPPORT_URL="https://bbs.archlinux.org/"
.. contents:: :depth: 2
FAIL: test
==========
fopen: No such file or directory
Jul 22 21:44:42.330220 [727438] {afp_config.c:247} (E:AFPDaemon): main: atp_open: Cannot assign requested address
Jul 22 21:44:42.330900 [727438] {dsi_tcp.c:349} (E:DSI): dsi_tcp_init: no suitable network config for TCP socket
Jul 22 21:44:42.330938 [727438] {afp_config.c:351} (E:AFPDaemon): main: dsi_init: Permission denied
Initializing
============
Testing: setuplog("default log_note /dev/tty") ... [ok]
Testing: afp_options_init(&default_options) ... [ok]
Testing: afp_options_parse( ARGNUM, args, &default_options) ... [ok]
Testing: configs = configinit(&default_options) ... [error]
FAIL test (exit status: 1)
The system I'm building on does not have netatalk2 running (or even installed) so I don't know what this could be about. I am building with -j30 and distributed with distcc, so maybe there is a race condition between tests?
To Reproduce Steps to reproduce the behavior.
Expected behavior A clear and concise description of what you expected to happen.
Environment
- Server OS: Arch Linux
- Client OS N/A
- Netatalk Version 2.4.4
Logs Um, not sure which ones would be relevant here.
Additional context If it is a crash, please attach a stacktrace.
Hm:
- If it is a race condition, it is awfully reproducible (100%).
- This only happens when building with
makepkgto build a package though. That in itself doesn't add sandboxing, it is just the executor for the package build instructions that Arch Linux uses. - It happens even if I don't sandbox the build (which is done with extra layers on top of
makepkg). - Outside
makpkgI can't reproduce it even with high-j. - 2.4.0 didn't have this issue (I haven't tried versions in between, have been too busy to keep up with this). Going back and rebuilding 2.4.0 it now also fails. So something external to netatalk2 failed and caused this. Fun.
- This is still autotools, I haven't had time to convert to meson yet (oops should have mentioned that in the original post).
I don't know the code of this project at all, so I need some help with what suggestions to try next.
The tests on both 2.x and 3.x have gotten unreliable on Linux specifically over the last few weeks, and I've not been able to figure out the root cause yet since the error messages and logs are not helpful. Like you say, it's almost as if something external to netatalk has changed and is interfering with the tests.
Two examples: https://github.com/Netatalk/netatalk/issues/1196 (Debian) https://github.com/Netatalk/netatalk/issues/1273 (Arch)
On my desktop (AMD Zen 3) it managed to build in makepkg, on my laptop (Intel Skylake) it fails.
Might be random though, haven't run it enough times to know.
Maybe time to run valgrind / ASAN / UBSAN / TSAN if you haven't already done so. A newer compiler or similar (especially likely if it affects only rolling release distros) could easily expose latent issues.
That's a good idea. TBH I've never used C at this level but it's a good learning experience.
The odd thing, though, is that my main dev machine is affected which is a very stable Debian Bookworm system, so it seems unlikely that a new compiler version would have been pushed out... While a VM on my MacBook running the exact same Debian Bookworm version is not... Both x86_64 architecture. There is some minute environmental difference here that I haven't figured out yet.
It's worth noting that the tests are passing with 2.4.x code in the Arch job in our GitHub CI workflow:
https://github.com/Netatalk/netatalk/actions/runs/10053455499/job/27786261358
It's worth noting that the tests are passing with 2.4.x code in the Arch job in our GitHub CI workflow
It seems to be very flaky for sure.
I'd be very curious to see if the same issue happens if you get around to setting up the Meson build system in the same environment.
I'd be very curious to see if the same issue happens if you get around to setting up the Meson build system in the same environment.
I want to, I just have limited time and energy at the moment.
@VorpalBlade Random idea: What happens if you build the entire package with -j1 (apart from taking a long long time)? With Meson, I was able to work around this by forcing sequential and single-threaded execution of the tests.
And yes, the tests should be rewritten and modernized. Something for a rainy day. :)
@VorpalBlade Would you have the opportunity to look at this again any time soon?
Just a heads-up that we're working towards a netatalk4 release now, which will obsolete both netatalk2 and netatalk3. So there's a chance that this bug is moot. :)
I have not seen it with meson. That said, it wasn't 100 % reproducible with autoconf, but I seem to remember to happened more often than not there, it is probably fixed when using meson
Closing this as won't-fix for now. Please reopen if you encounter the same issue with netatalk 4.0 or later!