Yggdrasil icon indicating copy to clipboard operation
Yggdrasil copied to clipboard

[PETSc] - Problem with _jll binaries after 3.15.2

Open JordiManyer opened this issue 2 years ago • 6 comments

Hi all,

I am a developer for GridapPETSc.jl, a package bridging PETSc_jll for the Gridap ecosystem.

While doing some non-related maintenance, we noticed our tests stopped working for PETSC_jll versions above v3.15.2. We also run our tests with manually compiled versions of PETSc, which run fine for all versions up to v3.19.4. Therefore this would point to a bug within the build of the _jll package. Since I am not an expert on artifact building in Julia, I would like some help on the matter.

Here is the issue and the PR where I am exploring this matter. As you can see in the PR, tests run fine for manually compiled PETSc (i.e the CI_EXTRA jobs) but fail for the _jll tests (i.e the CI jobs) which uses the latest PETSc_jll v3.18.6. The latter issue is solved if using PETSc_jll v3.13.x or v3.15.x instead, and starts failing for newer versions.

@boriskaus I understand you have been taking care of most of the releases for PETSc_jll (thank you, btw). Would you be able to have a look at this?

JordiManyer avatar Aug 23 '23 05:08 JordiManyer

ai, that is going to be a tricky one to find. It seems to crash in the middle of a computation. One thing I found with our PETSc-based codes is that the multithreading of the Julia BLAS libraries causes problems/crashes (and makes the calculations very slow). This is activated by default; you can switch that off by setting the environmental variable OMP_NUM_THREADS=1, as done here and here. How is that dealt with in GridapPETSc.jl?

boriskaus avatar Aug 23 '23 07:08 boriskaus

How is that dealt with in GridapPETSc.jl?

I believe it just isn't. I may look into it, although we compile our own petsc libraries for all our important runs. However, I've done a couple of tests by manually setting the environment variable and it does not seem to be what causes the crash...

JordiManyer avatar Aug 24 '23 06:08 JordiManyer

I doubt that you use multithreaded blas for your local PETSc build. The fact that your tests hang for several hours as well as that it only occurs in parallel is all consistent with this.

boriskaus avatar Aug 24 '23 07:08 boriskaus

You probably want to use BLAS.set_num_threads(1) to disable OpenBLAS threading, and avoid its threading issues.

ViralBShah avatar Oct 06 '23 12:10 ViralBShah

Hi ! any update on this?

amartinhuertas avatar Mar 21 '24 04:03 amartinhuertas

Not from my side; I'm more than to receive help in compiling new versions of PETSc_jll, though...

boriskaus avatar Mar 21 '24 07:03 boriskaus

There are various new versions of PETSc_jll now. Please reopen if still an issue.

ViralBShah avatar Sep 29 '25 03:09 ViralBShah

There was definitely something changing between older PETSc versions and newer PETSc versions (I'm not sure it was the jll's fault or just something in PETSc). In the end we just did some changes to our library so that we could support newer versions of the jll, at the cost of not supporting older versions (which is fine). But yeah, this was kinda resolved. Thanks everyone for the help!

JordiManyer avatar Sep 29 '25 05:09 JordiManyer