mdanalysis icon indicating copy to clipboard operation
mdanalysis copied to clipboard

Build time tests makes machine to hang

Open sanvila opened this issue 8 months ago • 9 comments

Hello. This was reported to Debian here:

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1108309

When building (the Debian package of) mdanalysis, version 2.9.0, on AWS machines of types m7a.large or r7a.large, which incidentally have 2 vCPUs, the build-time tests timeout and the build results in failure. I've put several failed build logs here as a sample:

https://people.debian.org/~sanvila/build-logs/202507/

The failure rate on the above machine types is easily around 90%, and of course the expected behavior is that the tests finish in about 10 minutes or less, as that's what it takes on similar AWS machines having a single CPU only.

Is there a minimum number of CPUs required to run the tests, or is this a bug in the tests? (some kind of deadlock, for example). If it was the former, we could just skip the tests when we know for sure that they will fail or hang the machine, but if that's the case it would be nice to have such thing documented.

Note: I'm trying the build on Debian testing (which will become Debian 13 soon), and it has python 3.13.3 and Linux 6.12.33.

In case you have any difficulty in reproducing the problem, I could offer a VM to test (please contact me privately for details. I'm easily reachable at debian.org).

Cc to @drew-parsons as the usual Debian maintainer so that he's aware of how this develops.

Thanks.

sanvila avatar Jul 08 '25 01:07 sanvila

Hi @sanvila,

Apologies for the delayed response.

Unfortunately we don't have any insights on how things are being built for debian, so please forgive the naive questions?

  1. What are your build time tests? Are you running pytest -n auto?
  2. When you say "expected behaviour.. finish in about 10 minutes or less.. on similar AWS machines with a single CPU", do you mean that MDAnalysis uses finishes in 10 mins when you don't use multiple cores? (this is particularly unclear to me since you then ask if there's a minimum number oof CPUs required to run the tests)

Is there a minimum number of CPUs required to run the tests, or is this a bug in the tests?

Locally I can't reproduce this behaviour. We do get diminishing returns at higher core counts, but that's mostly because we're becoming I/O limited.

1 core: 4 mins 49s 2 cores: 2 mins 43s 4 cores: 1 min 50s 8 cores: 1 min 29s

Our github CI runners also seem to not show a performance loss at ~ 4 cores (number of cores fluctuates on the runner).

some kind of deadlock, for example

Insufficient RAM?

Is it possible that you're running out of RAM? MDAnalysis tests use large files that end up being in-memory. m7a.large gets 8 GB of RAM, so after system resources / whatever else you're running, you might be running out of memory? However we generally find that things run fine with up to 4 way parallelism on github action runners, which don't have a lot of RAM either...

Slow I/O

Depending on what disk you are using on your AWS runner (note: EBS storage can get really slow with the wrong options), then it might be that you're hitting an I/O limit of some kind. Using a local disk on the instance might be a lot better.

OpenMP (et al.) thread contention

The only other thing that could come to mind is that numpy / scipy use BLAS and LAPACK which use their threading (e.g. OpenMP). This can sometime cause a bit of contention on machines with limited hardware, but nothing of the sort you've reported has been seen before. Manually setting OMP_NUM_THREADS to 1 can help things a little bit (e.g. my 8 cores case goes down to 1 min 16s).

IAlibay avatar Jul 19 '25 05:07 IAlibay

Hi. Sorry for the late reply. I'll try to be brief.

The way tests are run during build may be inferred from the build logs, by looking at dh_auto_test calls. In this case it's like this:

for py in 3.13; do \
  echo "=== testing with python$py ==="; \
  pydir=`pybuild -p $py --system=distutils --print {build_dir}`; \
  MPLBACKEND=agg PYTHONPATH=$pydir python$py -mpytest -v -k "not ( parallel or multiprocess or openmp or gsd or GSD or test_distances or test_all_import[.analysis.hole2] or journal.pcbi.1004568 )" --disable-pytest-warnings testsuite; \
  rm -rf $pydir/MDAnalysis/.hypothesis; \
  rm -rf $pydir/MDAnalysis/.duecredit.p; \
done

So, no "pytest -n auto", which I guess it means they do not run in parallel. Regardless of tests being run in parallel or not, I always expect that the time it takes to build with 1 CPU is <= the time it takes with 2 CPUs, as I believe it's a reasonable expectation.

Regarding the "minimum number of CPUs required to run the test", I expect the tests to work ok with any number of CPUs, including 1, but I'm aware that not every project has such policy (i.e. some people either do not consider it as a bug, or do not consider it as a bug worthy to be fixed), so I just wanted to be sure that running the tests with 1 CPU or 2 CPUs is a configuration that you are willing to support.

Insufficient RAM: Unlikely, because I monitor Committed_AS in /proc/meminfo during build to get statistics about how much memory each package needs, and I never build a package on a machine having less memory than required.

Slow I/O: Unlikely as well. If disk was slow, the build would take longer, but not hang in some kind of deadlock.

OpenMP (et al.) thread contention: I see that Drew (the usual maintainer) has uploaded a version for Debian experimental using your OMP_NUM_THREADS=1 suggestion so my next step (as time permits) will be to test the same package in my environment to see if it fixes the issue or not. Will let you know when I have some data to share. Thanks.

sanvila avatar Jul 24 '25 16:07 sanvila

There is this contradiction using OMP_NUM_THREADS=1 on armhf, which seems to be indicating the tests are expecting 2 CPUs (or 2 threads anyway)

___________________________ test_thread_limit_apply ____________________________

u = <Universe with 3341 atoms>

    def test_thread_limit_apply(u):
        default_thread_info = threadpool_info()
        default_num_thread_limit_list = [
            thread_info["num_threads"] for thread_info in default_thread_info
        ]
    
        new_trans = CustomTransformation(max_threads=2)
        _ = new_trans(u.trajectory.ts)
        for thread_info in new_trans.runtime_info:
>           assert thread_info["num_threads"] == 2
E           assert 1 == 2

/build/reproducible-path/mdanalysis-2.9.0/testsuite/MDAnalysisTests/transformations/test_base.py:102: AssertionError
=========================== short test summary info ============================
FAILED testsuite/MDAnalysisTests/transformations/test_base.py::test_thread_limit_apply
= 1 failed, 16613 passed, 565 skipped, 2704 deselected, 7 xfailed, 2 xpassed, 170182 warnings in 729.79s (0:12:09) =

It's strange because the same test is passing on other systems, https://buildd.debian.org/status/package.php?p=mdanalysis&suite=experimental

I guess this is a different issue to the timeouts though.

drew-parsons avatar Jul 24 '25 17:07 drew-parsons

Perhaps the test_thread_limit_apply error is related though. On a second build attempt, armhf passes with OMP_NUM_THREADS=1. But amd64 fails here. There seems to be general random failure, even just in test_thread_limit_apply with OMP_NUM_THREADS=1. s390x passed the first time, failed the second time.

There are a large of numerical failures on loong64 (getting numerical solutions but with the wrong value, distinct from the test_thread_limit_apply problem of the wrong number of threads). What's strange in this case is that loong64 was previously generally passing (without timeout) when OMP_NUM_THREADS=1 was not set.

drew-parsons avatar Jul 25 '25 11:07 drew-parsons

riscv64 gave the timeout error with OMP_NUM_THREADS=1 https://buildd.debian.org/status/fetch.php?pkg=mdanalysis&arch=riscv64&ver=2.9.0-13%7Eexp2&stamp=1753482859&raw=0

Admittedly riscv64 is a temperamental architecture that routinely runs slower than other architectures.

drew-parsons avatar Jul 26 '25 10:07 drew-parsons

The random timeout has shown up on amd64 with OMP_NUM_THREADS=1 at https://ci.debian.net/packages/m/mdanalysis/unstable/arm64/62419699/ (analysis/test_gnm.py::test_gnm[client_GNMAnalysis0] timed out)

It was seen in a 4th run after 3 successful tests, testing version 2.9.0-13~exp2 at https://ci.debian.net/packages/m/mdanalysis/unstable/arm64/

drew-parsons avatar Jul 27 '25 07:07 drew-parsons

Would it help in any way to skip tests such as test_thread_limit_apply https://github.com/MDAnalysis/mdanalysis/blob/519ac568252857b10fb37c8f1b37c3c157c0df7d/testsuite/MDAnalysisTests/transformations/test_base.py#L93 in environments where only 1 CPU is available?

orbeckst avatar Sep 30 '25 22:09 orbeckst

It it were the case that the tests always fail on systems with 1 CPU and never on systems with 2 CPUs, yes, it would help, but so far the problem seems more complex than that. In Debian we went ahead and disabled all tests for the time being:

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1108309

By skipping the tests on systems with only 1 CPU, I feel that we would be deviating from the "right" fix.

Thanks.

sanvila avatar Sep 30 '25 22:09 sanvila

The problem we're having here is that none of our local testing can reproduce this issue.

I.e. it's not happening on gh ci runners, it's not happening on local workstations, nor is it happening on the AWS runners I've tried.

If you can somehow provide us with either a) a box where this is reproducibly happening, or b) the exact AWS config (ideally whatever debian image you're using) then we can try to debug this.

P.S. If we go down the AWS route, I'll have to work out credits for access (I'm out of free credits), that might take longer so option a would be favourable.

IAlibay avatar Sep 30 '25 23:09 IAlibay