Reproducing CI
Currently (as of Feb 5, 2025), there are several flipping CI tests.
Let's have a look on one of them: https://github.com/grimme-lab/xtb/actions/runs/13165363016/job/36744070869?pr=1180.
After opening it, you can find something like that:
that is related to gfnff tests according to first lines:
So, our target to reproduce this error. Let's build this binary. There is an build instruction corresponding to failed job: https://github.com/grimme-lab/xtb/blob/5f7a2e245de45f5d09db445a35ab929d34228be7/.github/workflows/fortran-build.yml#L40-L52
So, I'm using built by my hands gfortran-14 on RHEL 8 on x86_64 Arch with MKL, while image has Ubuntu 24.04 and gfortran-12 and OpenBLAS. Anyway:
meson setup reproduce_CI --buildtype=debug --warnlevel=0 -Db_coverage=true -Dlapack=mkl
meson compile -C reproduce_CI
You will see a lot of compilation warnings, as usual, and at the final, you should have a new build of xtb. Now, it is time to run tests:
meson test -C reproduce_CI --print-errorlogs --no-rebuild -t 120 --suite xtb
And then you can see:
Ok: 32
Expected Fail: 1
Fail: 0
Unexpected Pass: 0
Skipped: 0
Timeout: 0
Ok! It works, you may say. However, it is not everything. During testing, meson sets env variables randomly. For us, the most important env variable is MALLOC_PERTURB_. Please, have a look now which value does it have for failed build. You should find value 255.
Now, let's restart only failed task with this variable:
MALLOC_PERTURB_=255 reproduce_CI/test/unit/tester gfnff
Wait a little bit... And see:
Error termination. Backtrace:
#0 0xb626fe in __testdrive_MOD_escalate_error
at ../subprojects/test-drive/src/testdrive.F90:1913
#1 0xb628f1 in __testdrive_MOD___final_testdrive_Error_type
at ../subprojects/test-drive/src/testdrive.F90:1964
#2 0x4f8c0e in test_gfnff_pbc
at ../test/unit/test_gfnff.f90:751
#3 0x40a394 in run_unittest
at ../test/unit/main.f90:169
#4 0x40a394 in run_testsuite
at ../test/unit/main.f90:149
#5 0x40b63e in tester
at ../test/unit/main.f90:103
#6 0x4080f7 in main
at ../test/unit/main.f90:20
Hooray! We reproduced CI!
That's neat, thank you for taking time to explain this:)
It is a tip, not a bug :(
It is a tip, not a bug :(
Haha, sorry, it's just for us internally to keep this issue on the to-do list before the next release. I’ve changed it to a task :)
You can pin issues :)
@grimme-lab/xtb, I think this should be our current priority so that stacked PRs can be merged before v6.7.2.
Since we are drastically changing our codebase in v7.0.0, it is important to have a stable version before making such changes.
With https://github.com/tblite/tblite/pull/230 and #1204, CI should not fail time to time :)
Ah.. We still have couple bugs in cpcm-x lib :(