Problem of ParCSR solvers with OpenMP on
The ParCSR solvers can solve the linear system with OpenMP off, but failed with OpenMP on. Error code is as follows
hypre error in file "pcg.c", line 709, error code = 256 - Subnormal gamma value in PCG
System matrix is in the attached file, and the elements of the rhs vector are all ones. A_2.zip
Any differences when switching OpenMP?
Did you use AMG as a preconditioner? If so, the smoother changes when turning OpenMP on, so yes you could get different behavior.
From: Yiwei Shao @.> Sent: Wednesday, August 3, 2022 6:52 PM To: hypre-space/hypre @.> Cc: Subscribed @.***> Subject: [hypre-space/hypre] Problem of ParCSR solvers with OpenMP on (Issue #709)
The ParCSR solvers can solve the linear system with OpenMP off, but failed with OpenMP on. Error code is as follows hypre error in file "pcg.c", line 709, error code = 256 - Subnormal gamma value in PCG
System matrix is in the attached file, and the elements of the rhs vector are all ones. A_2.ziphttps://urldefense.us/v3/__https:/github.com/hypre-space/hypre/files/9256067/A_2.zip__;!!G2kpM7uM-TzIFchu!lIQsofXFWDoFVNzoxdRIWxKLEl5ivHvGs7GK0uxpp7v4q_eXG3yY8QWn2wcertRA$
Any differences when switching OpenMP?
— Reply to this email directly, view it on GitHubhttps://urldefense.us/v3/__https:/github.com/hypre-space/hypre/issues/709__;!!G2kpM7uM-TzIFchu!lIQsofXFWDoFVNzoxdRIWxKLEl5ivHvGs7GK0uxpp7v4q_eXG3yY8QWn2_P8HeQ8$, or unsubscribehttps://urldefense.us/v3/__https:/github.com/notifications/unsubscribe-auth/AD4NLLJGNCODLNNJ3GDROU3VXMO3DANCNFSM55Q2BD2Q__;!!G2kpM7uM-TzIFchu!lIQsofXFWDoFVNzoxdRIWxKLEl5ivHvGs7GK0uxpp7v4q_eXG3yY8QWn2987yzPO$. You are receiving this because you are subscribed to this thread.Message ID: @.@.>>
@ulrikeyang Yes, I am. Can I avoid this problem? Here are my AMG settings:
HYPRE_BoomerAMGSetCoarsenType(amg_precond, 10);
HYPRE_BoomerAMGSetAggNumLevels(amg_precond, 1);
HYPRE_BoomerAMGSetRelaxType(amg_precond, 8);
HYPRE_BoomerAMGSetNumSweeps(amg_precond, 4);
HYPRE_BoomerAMGSetStrongThreshold(amg_precond, 0.25);
HYPRE_BoomerAMGSetInterpType(amg_precond, 6);
HYPRE_BoomerAMGSetPMaxElmts(amg_precond, 4);
HYPRE_BoomerAMGSetMaxLevels(amg_precond, 25);
HYPRE_BoomerAMGSetMaxIter(amg_precond, 1);
HYPRE_BoomerAMGSetTol(amg_precond, 0.0);
HYPRE_BoomerAMGSetNumFunctions(amg_precond, 1);
HYPRE_BoomerAMGSetAggNumLevels(amg_precond, 0);
HYPRE_BoomerAMGSetStrongThreshold(amg_precond, 0.5);
HYPRE_BoomerAMGSetNodal(amg_precond, 4);
HYPRE_BoomerAMGSetNodalDiag(amg_precond, 1);
HYPRE_BoomerAMGSetCycleRelaxType(amg_precond, 8, 3);
HYPRE_BoomerAMGSetInterpVecVariant(amg_precond, 2);
HYPRE_BoomerAMGSetInterpVecQMax(amg_precond, 4);
What kind of problem are you solving? If you set num_functions to 1, you are solving this as a scalar problem, but your other settings indicate that you are trying to solve something like an elasticity problem, where you want to set num_functions to 3. Also you are setting the strength threshold first to 0.25 and then to 0.5. Which one do you want?
From: Yiwei Shao @.> Sent: Thursday, August 4, 2022 7:53 AM To: hypre-space/hypre @.> Cc: Yang, Ulrike Meier @.>; Mention @.> Subject: Re: [hypre-space/hypre] Problem of ParCSR solvers with OpenMP on (Issue #709)
@ulrikeyanghttps://urldefense.us/v3/__https:/github.com/ulrikeyang__;!!G2kpM7uM-TzIFchu!njoMsSjPc5FoM3stbFsdtDGmPpi2te7MxkUVovPXs1U2rQo_89gqRy5bD3FAs40S$ Yes, I am. Can I avoid this problem? Here are my AMG settings: HYPRE_BoomerAMGSetCoarsenType(amg_precond, 10); HYPRE_BoomerAMGSetAggNumLevels(amg_precond, 1); HYPRE_BoomerAMGSetRelaxType(amg_precond, 8); HYPRE_BoomerAMGSetNumSweeps(amg_precond, 4); HYPRE_BoomerAMGSetStrongThreshold(amg_precond, 0.25); HYPRE_BoomerAMGSetInterpType(amg_precond, 6); HYPRE_BoomerAMGSetPMaxElmts(amg_precond, 4); HYPRE_BoomerAMGSetMaxLevels(amg_precond, 25); HYPRE_BoomerAMGSetMaxIter(amg_precond, 1); HYPRE_BoomerAMGSetTol(amg_precond, 0.0); HYPRE_BoomerAMGSetNumFunctions(amg_precond, 1); HYPRE_BoomerAMGSetAggNumLevels(amg_precond, 0); HYPRE_BoomerAMGSetStrongThreshold(amg_precond, 0.5); HYPRE_BoomerAMGSetNodal(amg_precond, 4); HYPRE_BoomerAMGSetNodalDiag(amg_precond, 1); HYPRE_BoomerAMGSetCycleRelaxType(amg_precond, 8, 3); HYPRE_BoomerAMGSetInterpVecVariant(amg_precond, 2); HYPRE_BoomerAMGSetInterpVecQMax(amg_precond, 4);
— Reply to this email directly, view it on GitHubhttps://urldefense.us/v3/__https:/github.com/hypre-space/hypre/issues/709*issuecomment-1205367763__;Iw!!G2kpM7uM-TzIFchu!njoMsSjPc5FoM3stbFsdtDGmPpi2te7MxkUVovPXs1U2rQo_89gqRy5bD835oxnQ$, or unsubscribehttps://urldefense.us/v3/__https:/github.com/notifications/unsubscribe-auth/AD4NLLMMGD5RWKYQJ2KPWO3VXPKNFANCNFSM55Q2BD2Q__;!!G2kpM7uM-TzIFchu!njoMsSjPc5FoM3stbFsdtDGmPpi2te7MxkUVovPXs1U2rQo_89gqRy5bD_eCO6wH$. You are receiving this because you were mentioned.Message ID: @.***>
Sorry for misleading. I just mix up the scalar settings an the elastic settings. The problem happens in the scalar part. Here are the scalar settings.
HYPRE_BoomerAMGSetCoarsenType(amg_precond, 10);
HYPRE_BoomerAMGSetAggNumLevels(amg_precond, 1);
HYPRE_BoomerAMGSetRelaxType(amg_precond, 8);
HYPRE_BoomerAMGSetNumSweeps(amg_precond, 4);
HYPRE_BoomerAMGSetStrongThreshold(amg_precond, 0.25);
HYPRE_BoomerAMGSetInterpType(amg_precond, 6);
HYPRE_BoomerAMGSetPMaxElmts(amg_precond, 4);
HYPRE_BoomerAMGSetMaxLevels(amg_precond, 25);
HYPRE_BoomerAMGSetMaxIter(amg_precond, 1);
HYPRE_BoomerAMGSetTol(amg_precond, 0.0);
HYPRE_BoomerAMGSetNumFunctions(amg_precond, 1);
Are you sure the problem is symmetric positive definite? Have you tried GMRES or BiCGSTAB with your problem?
From: Yiwei Shao @.> Sent: Thursday, August 4, 2022 8:44 AM To: hypre-space/hypre @.> Cc: Yang, Ulrike Meier @.>; Mention @.> Subject: Re: [hypre-space/hypre] Problem of ParCSR solvers with OpenMP on (Issue #709)
Sorry for misleading. I just mix up the scalar settings an the elastic settings. The problem happens in the scalar part. Here are the scalar settings.
HYPRE_BoomerAMGSetCoarsenType(amg_precond, 10);
HYPRE_BoomerAMGSetAggNumLevels(amg_precond, 1);
HYPRE_BoomerAMGSetRelaxType(amg_precond, 8);
HYPRE_BoomerAMGSetNumSweeps(amg_precond, 4);
HYPRE_BoomerAMGSetStrongThreshold(amg_precond, 0.25);
HYPRE_BoomerAMGSetInterpType(amg_precond, 6);
HYPRE_BoomerAMGSetPMaxElmts(amg_precond, 4);
HYPRE_BoomerAMGSetMaxLevels(amg_precond, 25);
HYPRE_BoomerAMGSetMaxIter(amg_precond, 1);
HYPRE_BoomerAMGSetTol(amg_precond, 0.0);
HYPRE_BoomerAMGSetNumFunctions(amg_precond, 1);
— Reply to this email directly, view it on GitHubhttps://urldefense.us/v3/__https:/github.com/hypre-space/hypre/issues/709*issuecomment-1205431126__;Iw!!G2kpM7uM-TzIFchu!nx0oCHF90uMlGx63Bcww1lWw-ZukaCrT6FsSDhGDR967VKiao-K2FcAnmVmEa0c8$, or unsubscribehttps://urldefense.us/v3/__https:/github.com/notifications/unsubscribe-auth/AD4NLLNZ2HA4LH3R427ZNS3VXPQMBANCNFSM55Q2BD2Q__;!!G2kpM7uM-TzIFchu!nx0oCHF90uMlGx63Bcww1lWw-ZukaCrT6FsSDhGDR967VKiao-K2FcAnmb85DlFx$. You are receiving this because you were mentioned.Message ID: @.@.>>
Yes, it is. Cholesky decomposition works fine on the matrix. Eigen and Scipy's conjugate gradient solver also works.
@njsyw1997, I cannot reproduce the behavior you reported. Here's the output that I get with BoomerAMG-PCG:
$ OMP_NUM_THREADS=12 mpirun -np 1 ./ij -fromfile A.ij -solver 1 -rhsisone
Using HYPRE_DEVELOP_STRING: v2.25.0-444-g98ab3445b (main development branch master)
Running with these driver parameters:
solver ID = 1
=============================================
Hypre init times:
=============================================
Hypre init:
wall clock time = 0.000001 seconds
wall MFLOPS = 0.000000
cpu clock time = 0.000000 seconds
cpu MFLOPS = 0.000000
=============================================
Generate Matrix:
=============================================
Spatial Operator:
wall clock time = 0.011350 seconds
wall MFLOPS = 0.000000
cpu clock time = 0.011351 seconds
cpu MFLOPS = 0.000000
Number of vector components: 1
RHS vector has unit coefficients
Initial guess is 0
=============================================
IJ Vector Setup:
=============================================
RHS and Initial Guess:
wall clock time = 0.000056 seconds
wall MFLOPS = 0.000000
cpu clock time = 0.000056 seconds
cpu MFLOPS = 0.000000
Solver: AMG-PCG
HYPRE_ParCSRPCGGetPrecond got good precond
Num MPI tasks = 1
Num OpenMP threads = 12
BoomerAMG SETUP PARAMETERS:
Max levels = 25
Num levels = 4
Strength Threshold = 0.250000
Interpolation Truncation Factor = 0.000000
Maximum Row Sum Threshold for Dependency Weakening = 0.900000
Coarsening Type = HMIS
No. of levels of aggressive coarsening: 1
Interpolation on agg. levels= multipass interpolation
measures are determined locally
No global partition option chosen.
Interpolation = extended+i interpolation
Operator Matrix Information:
nonzero entries/row row sums
lev rows entries sparse min max avg min max
======================================================================
0 1077 14085 0.012 1 25 13.1 -3.068e-01 1.409e+04
1 79 677 0.108 3 15 8.6 -1.066e-01 5.098e+00
2 26 354 0.524 8 22 13.6 -2.437e-03 5.667e+00
3 7 49 1.000 7 7 7.0 2.509e+00 7.085e+00
Interpolation Matrix Information:
entries/row min max row sums
lev rows x cols min max avgW weight weight min max
================================================================================
0 1077 x 79 0 5 1.3 2.890e-02 1.100e+00 0.000e+00 1.102e+00
1 79 x 26 1 4 3.5 1.932e-02 7.237e-01 2.348e-01 1.018e+00
2 26 x 7 1 4 3.1 7.138e-03 5.048e-01 1.247e-01 1.000e+00
Complexity: grid = 1.103993
operator = 1.076677
memory = 1.191693
BoomerAMG SOLVER PARAMETERS:
Maximum number of cycles: 1
Stopping Tolerance: 0.000000e+00
Cycle type (1 = V, 2 = W, etc.): 1
Relaxation Parameters:
Visiting Grid: down up coarse
Number of sweeps: 4 4 1
Type 0=Jac, 3=hGS, 6=hSGS, 9=GE: 8 8 9
Point types, partial sweeps (1=C, -1=F):
Pre-CG relaxation (down): 0 0 0 0
Post-CG relaxation (up): 0 0 0 0
Coarsest grid: 0
=============================================
Setup phase times:
=============================================
PCG Setup:
wall clock time = 0.001503 seconds
wall MFLOPS = 0.000000
cpu clock time = 0.016368 seconds
cpu MFLOPS = 0.000000
<b,b>: 1.077000e+03
Iters ||r||_2 conv.rate ||r||_2/||b||_2
----- ------------ --------- ------------
1 3.784225e+01 1.153106 1.153106e+00
2 1.078191e+01 0.284917 3.285398e-01
3 3.308574e+00 0.306863 1.008168e-01
4 7.421133e-01 0.224300 2.261322e-02
5 1.673504e-01 0.225505 5.099396e-03
6 4.621077e-02 0.276132 1.408106e-03
7 9.445583e-03 0.204402 2.878200e-04
8 2.021879e-03 0.214056 6.160946e-05
9 5.547049e-04 0.274351 1.690262e-05
10 1.074381e-04 0.193685 3.273788e-06
11 2.012216e-05 0.187291 6.131502e-07
12 3.551917e-06 0.176518 1.082318e-07
13 7.788839e-07 0.219285 2.373367e-08
14 1.616554e-07 0.207548 4.925864e-09
=============================================
Solve phase times:
=============================================
PCG Solve:
wall clock time = 0.003845 seconds
wall MFLOPS = 0.000000
cpu clock time = 0.043123 seconds
cpu MFLOPS = 0.000000
Iterations = 14
Final Relative Residual Norm = 4.925864e-09
Some questions:
- Which version of hypre did you use? I've used the most recent one https://github.com/hypre-space/hypre/commit/98ab3445b7f2542718aa8dfffb3120b78d1b3328
- How many threads did you use? I've used 12
- Did you build your matrix in hypre using zero-based index for rows and columns?
@victorapm Thanks a lot for your reference output. I think I find my bug. The problem is the thread number. If I do not explicityly set OMP_NUM_THREADS, the therad number will be the max of the available on the machine, which is 128 on my server. And according to my test, the threads cannot be more than 64. Does that mean Hypre cannot support more than 64 threads now? I am using the Hypre release v2.25.0. Here are the outputs in case they are helpful for you.
Num MPI tasks = 1
Num OpenMP threads = 64
BoomerAMG SETUP PARAMETERS:
Max levels = 25
Num levels = 4
Strength Threshold = 0.250000
Interpolation Truncation Factor = 0.000000
Maximum Row Sum Threshold for Dependency Weakening = 0.900000
Coarsening Type = HMIS
No. of levels of aggressive coarsening: 1
Interpolation on agg. levels= multipass interpolation
measures are determined locally
No global partition option chosen.
Interpolation = extended+i interpolation
Operator Matrix Information:
nonzero entries/row row sums
lev rows entries sparse min max avg min max
======================================================================
0 1077 14085 0.012 1 25 13.1 -3.068e-01 3.738e+00
1 78 682 0.112 3 15 8.7 -1.030e-01 5.098e+00
2 24 310 0.538 7 21 12.9 -1.815e-03 5.430e+00
3 7 47 0.959 6 7 6.7 3.016e+00 7.722e+00
Interpolation Matrix Information:
entries/row min max row sums
lev rows x cols min max avgW weight weight min max
================================================================================
0 1077 x 78 0 5 1.3 2.890e-02 1.100e+00 0.000e+00 1.102e+00
1 78 x 24 1 4 3.4 1.932e-02 7.795e-01 2.251e-01 1.016e+00
2 24 x 7 1 4 3.2 2.249e-02 5.058e-01 1.244e-01 1.000e+00
Complexity: grid = 1.101207
operator = 1.073766
memory = 1.187859
BoomerAMG SOLVER PARAMETERS:
Maximum number of cycles: 1
Stopping Tolerance: 0.000000e+00
Cycle type (1 = V, 2 = W, etc.): 1
Relaxation Parameters:
Visiting Grid: down up coarse
Number of sweeps: 1 1 1
Type 0=Jac, 3=hGS, 6=hSGS, 9=GE: 8 8 9
Point types, partial sweeps (1=C, -1=F):
Pre-CG relaxation (down): 0
Post-CG relaxation (up): 0
Coarsest grid: 0
<b,b>: 1.077000e+03
Iters ||r||_2 conv.rate ||r||_2/||b||_2
----- ------------ --------- ------------
1 1.008538e+02 3.073153 3.073153e+00
2 5.452640e+01 0.540648 1.661495e+00
3 7.002747e+01 1.284286 2.133834e+00
4 8.688783e+01 1.240768 2.647592e+00
hypre error in file "hypre/src/krylov/pcg.c", line 709, error code = 256 - Subnormal gamma value in PCG
Num MPI tasks = 1
Num OpenMP threads = 63
BoomerAMG SETUP PARAMETERS:
Max levels = 25
Num levels = 4
Strength Threshold = 0.250000
Interpolation Truncation Factor = 0.000000
Maximum Row Sum Threshold for Dependency Weakening = 0.900000
Coarsening Type = HMIS
No. of levels of aggressive coarsening: 1
Interpolation on agg. levels= multipass interpolation
measures are determined locally
No global partition option chosen.
Interpolation = extended+i interpolation
Operator Matrix Information:
nonzero entries/row row sums
lev rows entries sparse min max avg min max
======================================================================
0 1077 14085 0.012 1 25 13.1 -3.068e-01 3.738e+00
1 78 682 0.112 3 15 8.7 -1.030e-01 5.098e+00
2 24 310 0.538 7 21 12.9 -1.815e-03 5.430e+00
3 7 47 0.959 6 7 6.7 3.016e+00 7.722e+00
Interpolation Matrix Information:
entries/row min max row sums
lev rows x cols min max avgW weight weight min max
================================================================================
0 1077 x 78 0 5 1.3 2.890e-02 1.100e+00 0.000e+00 1.102e+00
1 78 x 24 1 4 3.4 1.932e-02 7.795e-01 2.251e-01 1.016e+00
2 24 x 7 1 4 3.2 2.249e-02 5.058e-01 1.244e-01 1.000e+00
Complexity: grid = 1.101207
operator = 1.073766
memory = 1.187859
BoomerAMG SOLVER PARAMETERS:
Maximum number of cycles: 1
Stopping Tolerance: 0.000000e+00
Cycle type (1 = V, 2 = W, etc.): 1
Relaxation Parameters:
Visiting Grid: down up coarse
Number of sweeps: 1 1 1
Type 0=Jac, 3=hGS, 6=hSGS, 9=GE: 8 8 9
Point types, partial sweeps (1=C, -1=F):
Pre-CG relaxation (down): 0
Post-CG relaxation (up): 0
Coarsest grid: 0
<b,b>: 1.077000e+03
Iters ||r||_2 conv.rate ||r||_2/||b||_2
----- ------------ --------- ------------
1 1.023999e+02 3.120266 3.120266e+00
2 4.713368e+01 0.460290 1.436228e+00
3 2.030724e+01 0.430843 6.187897e-01
4 1.029228e+01 0.506828 3.136201e-01
5 5.639706e+00 0.547955 1.718496e-01
6 3.034951e+00 0.538140 9.247916e-02
7 1.536586e+00 0.506297 4.682189e-02
8 7.252043e-01 0.471958 2.209798e-02
9 3.086561e-01 0.425613 9.405177e-03
10 1.326608e-01 0.429801 4.042358e-03
11 5.976345e-02 0.450498 1.821075e-03
12 2.878362e-02 0.481626 8.770766e-04
13 1.354919e-02 0.470726 4.128625e-04
14 6.019329e-03 0.444257 1.834173e-04
15 2.137133e-03 0.355045 6.512140e-05
16 7.655582e-04 0.358217 2.332762e-05
17 2.357928e-04 0.308001 7.184932e-06
18 8.526268e-05 0.361600 2.598072e-06
19 2.900952e-05 0.340237 8.839601e-07
20 1.060076e-05 0.365424 3.230199e-07
21 4.034142e-06 0.380552 1.229259e-07
22 1.785592e-06 0.442620 5.440946e-08
23 7.381106e-07 0.413370 2.249125e-08
24 2.675737e-07 0.362512 8.153340e-09
25 8.445827e-08 0.315645 2.573560e-09
26 2.718023e-08 0.321819 8.282193e-10
27 9.884733e-09 0.363674 3.012015e-10
28 3.446221e-09 0.348641 1.050111e-10
29 1.263154e-09 0.366533 3.849006e-11
@njsyw1997, there's no limit on the number of threads that you can use in hypre, however the relaxation method that you are using (L1-Gauss-Seidel) may not converge with a large number of threads. Hypre generally gives better performance when partitioning your problem into several processes, i.e., using MPI instead of OpenMP for parallel runs. A mix between MPI and OpenMP is also fine as long as the number of OpenMP threads is not large. If you are unable to use MPI and want to use a large number of threads while maintaining good convergence, you can use a relaxation method such as Jacobi (7), L1-Jacobi (18), or Chebyshev (16).
@njsyw1997, I'm closing this issue. Please, let us know if there are any other questions.