Build time tests makes machine to hang
Hello. This was reported to Debian here:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1108309
When building (the Debian package of) mdanalysis, version 2.9.0, on AWS machines of types m7a.large or r7a.large, which incidentally have 2 vCPUs, the build-time tests timeout and the build results in failure. I've put several failed build logs here as a sample:
https://people.debian.org/~sanvila/build-logs/202507/
The failure rate on the above machine types is easily around 90%, and of course the expected behavior is that the tests finish in about 10 minutes or less, as that's what it takes on similar AWS machines having a single CPU only.
Is there a minimum number of CPUs required to run the tests, or is this a bug in the tests? (some kind of deadlock, for example). If it was the former, we could just skip the tests when we know for sure that they will fail or hang the machine, but if that's the case it would be nice to have such thing documented.
Note: I'm trying the build on Debian testing (which will become Debian 13 soon), and it has python 3.13.3 and Linux 6.12.33.
In case you have any difficulty in reproducing the problem, I could offer a VM to test (please contact me privately for details. I'm easily reachable at debian.org).
Cc to @drew-parsons as the usual Debian maintainer so that he's aware of how this develops.
Thanks.
Hi @sanvila,
Apologies for the delayed response.
Unfortunately we don't have any insights on how things are being built for debian, so please forgive the naive questions?
- What are your build time tests? Are you running
pytest -n auto? - When you say "expected behaviour.. finish in about 10 minutes or less.. on similar AWS machines with a single CPU", do you mean that MDAnalysis uses finishes in 10 mins when you don't use multiple cores? (this is particularly unclear to me since you then ask if there's a minimum number oof CPUs required to run the tests)
Is there a minimum number of CPUs required to run the tests, or is this a bug in the tests?
Locally I can't reproduce this behaviour. We do get diminishing returns at higher core counts, but that's mostly because we're becoming I/O limited.
1 core: 4 mins 49s 2 cores: 2 mins 43s 4 cores: 1 min 50s 8 cores: 1 min 29s
Our github CI runners also seem to not show a performance loss at ~ 4 cores (number of cores fluctuates on the runner).
some kind of deadlock, for example
Insufficient RAM?
Is it possible that you're running out of RAM? MDAnalysis tests use large files that end up being in-memory. m7a.large gets 8 GB of RAM, so after system resources / whatever else you're running, you might be running out of memory? However we generally find that things run fine with up to 4 way parallelism on github action runners, which don't have a lot of RAM either...
Slow I/O
Depending on what disk you are using on your AWS runner (note: EBS storage can get really slow with the wrong options), then it might be that you're hitting an I/O limit of some kind. Using a local disk on the instance might be a lot better.
OpenMP (et al.) thread contention
The only other thing that could come to mind is that numpy / scipy use BLAS and LAPACK which use their threading (e.g. OpenMP). This can sometime cause a bit of contention on machines with limited hardware, but nothing of the sort you've reported has been seen before. Manually setting OMP_NUM_THREADS to 1 can help things a little bit (e.g. my 8 cores case goes down to 1 min 16s).
Hi. Sorry for the late reply. I'll try to be brief.
The way tests are run during build may be inferred from the build logs, by looking at dh_auto_test calls. In this case it's like this:
for py in 3.13; do \
echo "=== testing with python$py ==="; \
pydir=`pybuild -p $py --system=distutils --print {build_dir}`; \
MPLBACKEND=agg PYTHONPATH=$pydir python$py -mpytest -v -k "not ( parallel or multiprocess or openmp or gsd or GSD or test_distances or test_all_import[.analysis.hole2] or journal.pcbi.1004568 )" --disable-pytest-warnings testsuite; \
rm -rf $pydir/MDAnalysis/.hypothesis; \
rm -rf $pydir/MDAnalysis/.duecredit.p; \
done
So, no "pytest -n auto", which I guess it means they do not run in parallel. Regardless of tests being run in parallel or not, I always expect that the time it takes to build with 1 CPU is <= the time it takes with 2 CPUs, as I believe it's a reasonable expectation.
Regarding the "minimum number of CPUs required to run the test", I expect the tests to work ok with any number of CPUs, including 1, but I'm aware that not every project has such policy (i.e. some people either do not consider it as a bug, or do not consider it as a bug worthy to be fixed), so I just wanted to be sure that running the tests with 1 CPU or 2 CPUs is a configuration that you are willing to support.
Insufficient RAM: Unlikely, because I monitor Committed_AS in /proc/meminfo during build to get statistics about how much memory each package needs, and I never build a package on a machine having less memory than required.
Slow I/O: Unlikely as well. If disk was slow, the build would take longer, but not hang in some kind of deadlock.
OpenMP (et al.) thread contention: I see that Drew (the usual maintainer) has uploaded a version for Debian experimental using your OMP_NUM_THREADS=1 suggestion so my next step (as time permits) will be to test the same package in my environment to see if it fixes the issue or not. Will let you know when I have some data to share. Thanks.
There is this contradiction using OMP_NUM_THREADS=1 on armhf, which seems to be indicating the tests are expecting 2 CPUs (or 2 threads anyway)
___________________________ test_thread_limit_apply ____________________________
u = <Universe with 3341 atoms>
def test_thread_limit_apply(u):
default_thread_info = threadpool_info()
default_num_thread_limit_list = [
thread_info["num_threads"] for thread_info in default_thread_info
]
new_trans = CustomTransformation(max_threads=2)
_ = new_trans(u.trajectory.ts)
for thread_info in new_trans.runtime_info:
> assert thread_info["num_threads"] == 2
E assert 1 == 2
/build/reproducible-path/mdanalysis-2.9.0/testsuite/MDAnalysisTests/transformations/test_base.py:102: AssertionError
=========================== short test summary info ============================
FAILED testsuite/MDAnalysisTests/transformations/test_base.py::test_thread_limit_apply
= 1 failed, 16613 passed, 565 skipped, 2704 deselected, 7 xfailed, 2 xpassed, 170182 warnings in 729.79s (0:12:09) =
It's strange because the same test is passing on other systems, https://buildd.debian.org/status/package.php?p=mdanalysis&suite=experimental
I guess this is a different issue to the timeouts though.
Perhaps the test_thread_limit_apply error is related though. On a second build attempt, armhf passes with OMP_NUM_THREADS=1. But amd64 fails here. There seems to be general random failure, even just in test_thread_limit_apply with OMP_NUM_THREADS=1. s390x passed the first time, failed the second time.
There are a large of numerical failures on loong64 (getting numerical solutions but with the wrong value, distinct from the test_thread_limit_apply problem of the wrong number of threads). What's strange in this case is that loong64 was previously generally passing (without timeout) when OMP_NUM_THREADS=1 was not set.
riscv64 gave the timeout error with OMP_NUM_THREADS=1
https://buildd.debian.org/status/fetch.php?pkg=mdanalysis&arch=riscv64&ver=2.9.0-13%7Eexp2&stamp=1753482859&raw=0
Admittedly riscv64 is a temperamental architecture that routinely runs slower than other architectures.
The random timeout has shown up on amd64 with OMP_NUM_THREADS=1 at https://ci.debian.net/packages/m/mdanalysis/unstable/arm64/62419699/ (analysis/test_gnm.py::test_gnm[client_GNMAnalysis0] timed out)
It was seen in a 4th run after 3 successful tests, testing version 2.9.0-13~exp2 at https://ci.debian.net/packages/m/mdanalysis/unstable/arm64/
Would it help in any way to skip tests such as test_thread_limit_apply https://github.com/MDAnalysis/mdanalysis/blob/519ac568252857b10fb37c8f1b37c3c157c0df7d/testsuite/MDAnalysisTests/transformations/test_base.py#L93 in environments where only 1 CPU is available?
It it were the case that the tests always fail on systems with 1 CPU and never on systems with 2 CPUs, yes, it would help, but so far the problem seems more complex than that. In Debian we went ahead and disabled all tests for the time being:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1108309
By skipping the tests on systems with only 1 CPU, I feel that we would be deviating from the "right" fix.
Thanks.
The problem we're having here is that none of our local testing can reproduce this issue.
I.e. it's not happening on gh ci runners, it's not happening on local workstations, nor is it happening on the AWS runners I've tried.
If you can somehow provide us with either a) a box where this is reproducibly happening, or b) the exact AWS config (ideally whatever debian image you're using) then we can try to debug this.
P.S. If we go down the AWS route, I'll have to work out credits for access (I'm out of free credits), that might take longer so option a would be favourable.