[PETSc] - Problem with _jll binaries after 3.15.2
Hi all,
I am a developer for GridapPETSc.jl, a package bridging PETSc_jll for the Gridap ecosystem.
While doing some non-related maintenance, we noticed our tests stopped working for PETSC_jll versions above v3.15.2. We also run our tests with manually compiled versions of PETSc, which run fine for all versions up to v3.19.4. Therefore this would point to a bug within the build of the _jll package. Since I am not an expert on artifact building in Julia, I would like some help on the matter.
Here is the issue and the PR where I am exploring this matter.
As you can see in the PR, tests run fine for manually compiled PETSc (i.e the CI_EXTRA jobs) but fail for the _jll tests (i.e the CI jobs) which uses the latest PETSc_jll v3.18.6. The latter issue is solved if using PETSc_jll v3.13.x or v3.15.x instead, and starts failing for newer versions.
@boriskaus I understand you have been taking care of most of the releases for PETSc_jll (thank you, btw). Would you be able to have a look at this?
ai, that is going to be a tricky one to find. It seems to crash in the middle of a computation.
One thing I found with our PETSc-based codes is that the multithreading of the Julia BLAS libraries causes problems/crashes (and makes the calculations very slow). This is activated by default; you can switch that off by setting the environmental variable OMP_NUM_THREADS=1, as done here and here.
How is that dealt with in GridapPETSc.jl?
How is that dealt with in GridapPETSc.jl?
I believe it just isn't. I may look into it, although we compile our own petsc libraries for all our important runs. However, I've done a couple of tests by manually setting the environment variable and it does not seem to be what causes the crash...
I doubt that you use multithreaded blas for your local PETSc build. The fact that your tests hang for several hours as well as that it only occurs in parallel is all consistent with this.
You probably want to use BLAS.set_num_threads(1) to disable OpenBLAS threading, and avoid its threading issues.
Hi ! any update on this?
Not from my side; I'm more than to receive help in compiling new versions of PETSc_jll, though...
There are various new versions of PETSc_jll now. Please reopen if still an issue.
There was definitely something changing between older PETSc versions and newer PETSc versions (I'm not sure it was the jll's fault or just something in PETSc). In the end we just did some changes to our library so that we could support newer versions of the jll, at the cost of not supporting older versions (which is fine). But yeah, this was kinda resolved. Thanks everyone for the help!