hypre icon indicating copy to clipboard operation
hypre copied to clipboard

Problem of ParCSR solvers with OpenMP on

Open njsyw1997 opened this issue 3 years ago • 9 comments

The ParCSR solvers can solve the linear system with OpenMP off, but failed with OpenMP on. Error code is as follows hypre error in file "pcg.c", line 709, error code = 256 - Subnormal gamma value in PCG

System matrix is in the attached file, and the elements of the rhs vector are all ones. A_2.zip

Any differences when switching OpenMP?

njsyw1997 avatar Aug 04 '22 01:08 njsyw1997

Did you use AMG as a preconditioner? If so, the smoother changes when turning OpenMP on, so yes you could get different behavior.

From: Yiwei Shao @.> Sent: Wednesday, August 3, 2022 6:52 PM To: hypre-space/hypre @.> Cc: Subscribed @.***> Subject: [hypre-space/hypre] Problem of ParCSR solvers with OpenMP on (Issue #709)

The ParCSR solvers can solve the linear system with OpenMP off, but failed with OpenMP on. Error code is as follows hypre error in file "pcg.c", line 709, error code = 256 - Subnormal gamma value in PCG

System matrix is in the attached file, and the elements of the rhs vector are all ones. A_2.ziphttps://urldefense.us/v3/__https:/github.com/hypre-space/hypre/files/9256067/A_2.zip__;!!G2kpM7uM-TzIFchu!lIQsofXFWDoFVNzoxdRIWxKLEl5ivHvGs7GK0uxpp7v4q_eXG3yY8QWn2wcertRA$

Any differences when switching OpenMP?

— Reply to this email directly, view it on GitHubhttps://urldefense.us/v3/__https:/github.com/hypre-space/hypre/issues/709__;!!G2kpM7uM-TzIFchu!lIQsofXFWDoFVNzoxdRIWxKLEl5ivHvGs7GK0uxpp7v4q_eXG3yY8QWn2_P8HeQ8$, or unsubscribehttps://urldefense.us/v3/__https:/github.com/notifications/unsubscribe-auth/AD4NLLJGNCODLNNJ3GDROU3VXMO3DANCNFSM55Q2BD2Q__;!!G2kpM7uM-TzIFchu!lIQsofXFWDoFVNzoxdRIWxKLEl5ivHvGs7GK0uxpp7v4q_eXG3yY8QWn2987yzPO$. You are receiving this because you are subscribed to this thread.Message ID: @.@.>>

ulrikeyang avatar Aug 04 '22 14:08 ulrikeyang

@ulrikeyang Yes, I am. Can I avoid this problem? Here are my AMG settings:

HYPRE_BoomerAMGSetCoarsenType(amg_precond, 10); 
HYPRE_BoomerAMGSetAggNumLevels(amg_precond, 1);
HYPRE_BoomerAMGSetRelaxType(amg_precond, 8);
HYPRE_BoomerAMGSetNumSweeps(amg_precond, 4);
HYPRE_BoomerAMGSetStrongThreshold(amg_precond, 0.25);  
HYPRE_BoomerAMGSetInterpType(amg_precond, 6);
HYPRE_BoomerAMGSetPMaxElmts(amg_precond, 4);  
HYPRE_BoomerAMGSetMaxLevels(amg_precond, 25);
HYPRE_BoomerAMGSetMaxIter(amg_precond, 1);  
HYPRE_BoomerAMGSetTol(amg_precond, 0.0);
HYPRE_BoomerAMGSetNumFunctions(amg_precond, 1); 
HYPRE_BoomerAMGSetAggNumLevels(amg_precond, 0); 
HYPRE_BoomerAMGSetStrongThreshold(amg_precond, 0.5); 
HYPRE_BoomerAMGSetNodal(amg_precond, 4);
HYPRE_BoomerAMGSetNodalDiag(amg_precond, 1);
HYPRE_BoomerAMGSetCycleRelaxType(amg_precond, 8, 3);
HYPRE_BoomerAMGSetInterpVecVariant(amg_precond, 2); 
HYPRE_BoomerAMGSetInterpVecQMax(amg_precond, 4);  

njsyw1997 avatar Aug 04 '22 14:08 njsyw1997

What kind of problem are you solving? If you set num_functions to 1, you are solving this as a scalar problem, but your other settings indicate that you are trying to solve something like an elasticity problem, where you want to set num_functions to 3. Also you are setting the strength threshold first to 0.25 and then to 0.5. Which one do you want?

From: Yiwei Shao @.> Sent: Thursday, August 4, 2022 7:53 AM To: hypre-space/hypre @.> Cc: Yang, Ulrike Meier @.>; Mention @.> Subject: Re: [hypre-space/hypre] Problem of ParCSR solvers with OpenMP on (Issue #709)

@ulrikeyanghttps://urldefense.us/v3/__https:/github.com/ulrikeyang__;!!G2kpM7uM-TzIFchu!njoMsSjPc5FoM3stbFsdtDGmPpi2te7MxkUVovPXs1U2rQo_89gqRy5bD3FAs40S$ Yes, I am. Can I avoid this problem? Here are my AMG settings: HYPRE_BoomerAMGSetCoarsenType(amg_precond, 10); HYPRE_BoomerAMGSetAggNumLevels(amg_precond, 1); HYPRE_BoomerAMGSetRelaxType(amg_precond, 8); HYPRE_BoomerAMGSetNumSweeps(amg_precond, 4); HYPRE_BoomerAMGSetStrongThreshold(amg_precond, 0.25); HYPRE_BoomerAMGSetInterpType(amg_precond, 6); HYPRE_BoomerAMGSetPMaxElmts(amg_precond, 4); HYPRE_BoomerAMGSetMaxLevels(amg_precond, 25); HYPRE_BoomerAMGSetMaxIter(amg_precond, 1); HYPRE_BoomerAMGSetTol(amg_precond, 0.0); HYPRE_BoomerAMGSetNumFunctions(amg_precond, 1); HYPRE_BoomerAMGSetAggNumLevels(amg_precond, 0); HYPRE_BoomerAMGSetStrongThreshold(amg_precond, 0.5); HYPRE_BoomerAMGSetNodal(amg_precond, 4); HYPRE_BoomerAMGSetNodalDiag(amg_precond, 1); HYPRE_BoomerAMGSetCycleRelaxType(amg_precond, 8, 3); HYPRE_BoomerAMGSetInterpVecVariant(amg_precond, 2); HYPRE_BoomerAMGSetInterpVecQMax(amg_precond, 4);

— Reply to this email directly, view it on GitHubhttps://urldefense.us/v3/__https:/github.com/hypre-space/hypre/issues/709*issuecomment-1205367763__;Iw!!G2kpM7uM-TzIFchu!njoMsSjPc5FoM3stbFsdtDGmPpi2te7MxkUVovPXs1U2rQo_89gqRy5bD835oxnQ$, or unsubscribehttps://urldefense.us/v3/__https:/github.com/notifications/unsubscribe-auth/AD4NLLMMGD5RWKYQJ2KPWO3VXPKNFANCNFSM55Q2BD2Q__;!!G2kpM7uM-TzIFchu!njoMsSjPc5FoM3stbFsdtDGmPpi2te7MxkUVovPXs1U2rQo_89gqRy5bD_eCO6wH$. You are receiving this because you were mentioned.Message ID: @.***>

ulrikeyang avatar Aug 04 '22 15:08 ulrikeyang

Sorry for misleading. I just mix up the scalar settings an the elastic settings. The problem happens in the scalar part. Here are the scalar settings.

HYPRE_BoomerAMGSetCoarsenType(amg_precond, 10); 
HYPRE_BoomerAMGSetAggNumLevels(amg_precond, 1);
HYPRE_BoomerAMGSetRelaxType(amg_precond, 8);
HYPRE_BoomerAMGSetNumSweeps(amg_precond, 4);
HYPRE_BoomerAMGSetStrongThreshold(amg_precond, 0.25);  
HYPRE_BoomerAMGSetInterpType(amg_precond, 6);
HYPRE_BoomerAMGSetPMaxElmts(amg_precond, 4);  
HYPRE_BoomerAMGSetMaxLevels(amg_precond, 25);
HYPRE_BoomerAMGSetMaxIter(amg_precond, 1);  
HYPRE_BoomerAMGSetTol(amg_precond, 0.0);
HYPRE_BoomerAMGSetNumFunctions(amg_precond, 1); 

njsyw1997 avatar Aug 04 '22 15:08 njsyw1997

Are you sure the problem is symmetric positive definite? Have you tried GMRES or BiCGSTAB with your problem?

From: Yiwei Shao @.> Sent: Thursday, August 4, 2022 8:44 AM To: hypre-space/hypre @.> Cc: Yang, Ulrike Meier @.>; Mention @.> Subject: Re: [hypre-space/hypre] Problem of ParCSR solvers with OpenMP on (Issue #709)

Sorry for misleading. I just mix up the scalar settings an the elastic settings. The problem happens in the scalar part. Here are the scalar settings.

HYPRE_BoomerAMGSetCoarsenType(amg_precond, 10);

HYPRE_BoomerAMGSetAggNumLevels(amg_precond, 1);

HYPRE_BoomerAMGSetRelaxType(amg_precond, 8);

HYPRE_BoomerAMGSetNumSweeps(amg_precond, 4);

HYPRE_BoomerAMGSetStrongThreshold(amg_precond, 0.25);

HYPRE_BoomerAMGSetInterpType(amg_precond, 6);

HYPRE_BoomerAMGSetPMaxElmts(amg_precond, 4);

HYPRE_BoomerAMGSetMaxLevels(amg_precond, 25);

HYPRE_BoomerAMGSetMaxIter(amg_precond, 1);

HYPRE_BoomerAMGSetTol(amg_precond, 0.0);

HYPRE_BoomerAMGSetNumFunctions(amg_precond, 1);

— Reply to this email directly, view it on GitHubhttps://urldefense.us/v3/__https:/github.com/hypre-space/hypre/issues/709*issuecomment-1205431126__;Iw!!G2kpM7uM-TzIFchu!nx0oCHF90uMlGx63Bcww1lWw-ZukaCrT6FsSDhGDR967VKiao-K2FcAnmVmEa0c8$, or unsubscribehttps://urldefense.us/v3/__https:/github.com/notifications/unsubscribe-auth/AD4NLLNZ2HA4LH3R427ZNS3VXPQMBANCNFSM55Q2BD2Q__;!!G2kpM7uM-TzIFchu!nx0oCHF90uMlGx63Bcww1lWw-ZukaCrT6FsSDhGDR967VKiao-K2FcAnmb85DlFx$. You are receiving this because you were mentioned.Message ID: @.@.>>

ulrikeyang avatar Aug 04 '22 17:08 ulrikeyang

Yes, it is. Cholesky decomposition works fine on the matrix. Eigen and Scipy's conjugate gradient solver also works.

njsyw1997 avatar Aug 04 '22 17:08 njsyw1997

@njsyw1997, I cannot reproduce the behavior you reported. Here's the output that I get with BoomerAMG-PCG:

$ OMP_NUM_THREADS=12 mpirun -np 1 ./ij -fromfile A.ij -solver 1 -rhsisone

Using HYPRE_DEVELOP_STRING: v2.25.0-444-g98ab3445b (main development branch master)

Running with these driver parameters:
  solver ID    = 1

=============================================
Hypre init times:
=============================================
Hypre init:
  wall clock time = 0.000001 seconds
  wall MFLOPS     = 0.000000
  cpu clock time  = 0.000000 seconds
  cpu MFLOPS      = 0.000000

=============================================
Generate Matrix:
=============================================
Spatial Operator:
  wall clock time = 0.011350 seconds
  wall MFLOPS     = 0.000000
  cpu clock time  = 0.011351 seconds
  cpu MFLOPS      = 0.000000

  Number of vector components: 1
  RHS vector has unit coefficients
  Initial guess is 0
=============================================
IJ Vector Setup:
=============================================
RHS and Initial Guess:
  wall clock time = 0.000056 seconds
  wall MFLOPS     = 0.000000
  cpu clock time  = 0.000056 seconds
  cpu MFLOPS      = 0.000000

Solver: AMG-PCG
HYPRE_ParCSRPCGGetPrecond got good precond


 Num MPI tasks = 1

 Num OpenMP threads = 12


BoomerAMG SETUP PARAMETERS:

 Max levels = 25
 Num levels = 4

 Strength Threshold = 0.250000
 Interpolation Truncation Factor = 0.000000
 Maximum Row Sum Threshold for Dependency Weakening = 0.900000

 Coarsening Type = HMIS

 No. of levels of aggressive coarsening: 1

 Interpolation on agg. levels= multipass interpolation
 measures are determined locally


 No global partition option chosen.

 Interpolation = extended+i interpolation

Operator Matrix Information:

             nonzero            entries/row          row sums
lev    rows  entries sparse   min  max     avg      min         max
======================================================================
  0    1077    14085  0.012     1   25    13.1  -3.068e-01   1.409e+04
  1      79      677  0.108     3   15     8.6  -1.066e-01   5.098e+00
  2      26      354  0.524     8   22    13.6  -2.437e-03   5.667e+00
  3       7       49  1.000     7    7     7.0   2.509e+00   7.085e+00


Interpolation Matrix Information:
                    entries/row        min        max            row sums
lev  rows x cols  min  max  avgW     weight      weight       min         max
================================================================================
  0  1077 x 79      0    5   1.3   2.890e-02   1.100e+00   0.000e+00   1.102e+00
  1    79 x 26      1    4   3.5   1.932e-02   7.237e-01   2.348e-01   1.018e+00
  2    26 x 7       1    4   3.1   7.138e-03   5.048e-01   1.247e-01   1.000e+00


     Complexity:    grid = 1.103993
                operator = 1.076677
                memory = 1.191693




BoomerAMG SOLVER PARAMETERS:

  Maximum number of cycles:         1
  Stopping Tolerance:               0.000000e+00
  Cycle type (1 = V, 2 = W, etc.):  1

  Relaxation Parameters:
   Visiting Grid:                     down   up  coarse
            Number of sweeps:            4    4     1
   Type 0=Jac, 3=hGS, 6=hSGS, 9=GE:      8    8     9
   Point types, partial sweeps (1=C, -1=F):
                  Pre-CG relaxation (down):   0   0   0   0
                   Post-CG relaxation (up):   0   0   0   0
                             Coarsest grid:   0

=============================================
Setup phase times:
=============================================
PCG Setup:
  wall clock time = 0.001503 seconds
  wall MFLOPS     = 0.000000
  cpu clock time  = 0.016368 seconds
  cpu MFLOPS      = 0.000000

<b,b>: 1.077000e+03


Iters       ||r||_2     conv.rate  ||r||_2/||b||_2
-----    ------------   ---------  ------------
    1    3.784225e+01    1.153106    1.153106e+00
    2    1.078191e+01    0.284917    3.285398e-01
    3    3.308574e+00    0.306863    1.008168e-01
    4    7.421133e-01    0.224300    2.261322e-02
    5    1.673504e-01    0.225505    5.099396e-03
    6    4.621077e-02    0.276132    1.408106e-03
    7    9.445583e-03    0.204402    2.878200e-04
    8    2.021879e-03    0.214056    6.160946e-05
    9    5.547049e-04    0.274351    1.690262e-05
   10    1.074381e-04    0.193685    3.273788e-06
   11    2.012216e-05    0.187291    6.131502e-07
   12    3.551917e-06    0.176518    1.082318e-07
   13    7.788839e-07    0.219285    2.373367e-08
   14    1.616554e-07    0.207548    4.925864e-09


=============================================
Solve phase times:
=============================================
PCG Solve:
  wall clock time = 0.003845 seconds
  wall MFLOPS     = 0.000000
  cpu clock time  = 0.043123 seconds
  cpu MFLOPS      = 0.000000


Iterations = 14
Final Relative Residual Norm = 4.925864e-09

Some questions:

  1. Which version of hypre did you use? I've used the most recent one https://github.com/hypre-space/hypre/commit/98ab3445b7f2542718aa8dfffb3120b78d1b3328
  2. How many threads did you use? I've used 12
  3. Did you build your matrix in hypre using zero-based index for rows and columns?

victorapm avatar Aug 06 '22 17:08 victorapm

@victorapm Thanks a lot for your reference output. I think I find my bug. The problem is the thread number. If I do not explicityly set OMP_NUM_THREADS, the therad number will be the max of the available on the machine, which is 128 on my server. And according to my test, the threads cannot be more than 64. Does that mean Hypre cannot support more than 64 threads now? I am using the Hypre release v2.25.0. Here are the outputs in case they are helpful for you.

 Num MPI tasks = 1

 Num OpenMP threads = 64


BoomerAMG SETUP PARAMETERS:

 Max levels = 25
 Num levels = 4

 Strength Threshold = 0.250000
 Interpolation Truncation Factor = 0.000000
 Maximum Row Sum Threshold for Dependency Weakening = 0.900000

 Coarsening Type = HMIS 

 No. of levels of aggressive coarsening: 1

 Interpolation on agg. levels= multipass interpolation
 measures are determined locally


 No global partition option chosen.

 Interpolation = extended+i interpolation

Operator Matrix Information:

             nonzero            entries/row          row sums
lev    rows  entries sparse   min  max     avg      min         max
======================================================================
  0    1077    14085  0.012     1   25    13.1  -3.068e-01   3.738e+00
  1      78      682  0.112     3   15     8.7  -1.030e-01   5.098e+00
  2      24      310  0.538     7   21    12.9  -1.815e-03   5.430e+00
  3       7       47  0.959     6    7     6.7   3.016e+00   7.722e+00


Interpolation Matrix Information:
                    entries/row        min        max            row sums
lev  rows x cols  min  max  avgW     weight      weight       min         max
================================================================================
  0  1077 x 78      0    5   1.3   2.890e-02   1.100e+00   0.000e+00   1.102e+00
  1    78 x 24      1    4   3.4   1.932e-02   7.795e-01   2.251e-01   1.016e+00
  2    24 x 7       1    4   3.2   2.249e-02   5.058e-01   1.244e-01   1.000e+00


     Complexity:    grid = 1.101207
                operator = 1.073766
                memory = 1.187859




BoomerAMG SOLVER PARAMETERS:

  Maximum number of cycles:         1 
  Stopping Tolerance:               0.000000e+00 
  Cycle type (1 = V, 2 = W, etc.):  1

  Relaxation Parameters:
   Visiting Grid:                     down   up  coarse
            Number of sweeps:            1    1     1 
   Type 0=Jac, 3=hGS, 6=hSGS, 9=GE:      8    8     9 
   Point types, partial sweeps (1=C, -1=F):
                  Pre-CG relaxation (down):   0
                   Post-CG relaxation (up):   0
                             Coarsest grid:   0

<b,b>: 1.077000e+03


Iters       ||r||_2     conv.rate  ||r||_2/||b||_2
-----    ------------   ---------  ------------ 
    1    1.008538e+02    3.073153    3.073153e+00
    2    5.452640e+01    0.540648    1.661495e+00
    3    7.002747e+01    1.284286    2.133834e+00
    4    8.688783e+01    1.240768    2.647592e+00
hypre error in file "hypre/src/krylov/pcg.c", line 709, error code = 256 - Subnormal gamma value in PCG
 Num MPI tasks = 1

 Num OpenMP threads = 63


BoomerAMG SETUP PARAMETERS:

 Max levels = 25
 Num levels = 4

 Strength Threshold = 0.250000
 Interpolation Truncation Factor = 0.000000
 Maximum Row Sum Threshold for Dependency Weakening = 0.900000

 Coarsening Type = HMIS 

 No. of levels of aggressive coarsening: 1

 Interpolation on agg. levels= multipass interpolation
 measures are determined locally


 No global partition option chosen.

 Interpolation = extended+i interpolation

Operator Matrix Information:

             nonzero            entries/row          row sums
lev    rows  entries sparse   min  max     avg      min         max
======================================================================
  0    1077    14085  0.012     1   25    13.1  -3.068e-01   3.738e+00
  1      78      682  0.112     3   15     8.7  -1.030e-01   5.098e+00
  2      24      310  0.538     7   21    12.9  -1.815e-03   5.430e+00
  3       7       47  0.959     6    7     6.7   3.016e+00   7.722e+00


Interpolation Matrix Information:
                    entries/row        min        max            row sums
lev  rows x cols  min  max  avgW     weight      weight       min         max
================================================================================
  0  1077 x 78      0    5   1.3   2.890e-02   1.100e+00   0.000e+00   1.102e+00
  1    78 x 24      1    4   3.4   1.932e-02   7.795e-01   2.251e-01   1.016e+00
  2    24 x 7       1    4   3.2   2.249e-02   5.058e-01   1.244e-01   1.000e+00


     Complexity:    grid = 1.101207
                operator = 1.073766
                memory = 1.187859




BoomerAMG SOLVER PARAMETERS:

  Maximum number of cycles:         1 
  Stopping Tolerance:               0.000000e+00 
  Cycle type (1 = V, 2 = W, etc.):  1

  Relaxation Parameters:
   Visiting Grid:                     down   up  coarse
            Number of sweeps:            1    1     1 
   Type 0=Jac, 3=hGS, 6=hSGS, 9=GE:      8    8     9 
   Point types, partial sweeps (1=C, -1=F):
                  Pre-CG relaxation (down):   0
                   Post-CG relaxation (up):   0
                             Coarsest grid:   0

<b,b>: 1.077000e+03


Iters       ||r||_2     conv.rate  ||r||_2/||b||_2
-----    ------------   ---------  ------------ 
    1    1.023999e+02    3.120266    3.120266e+00
    2    4.713368e+01    0.460290    1.436228e+00
    3    2.030724e+01    0.430843    6.187897e-01
    4    1.029228e+01    0.506828    3.136201e-01
    5    5.639706e+00    0.547955    1.718496e-01
    6    3.034951e+00    0.538140    9.247916e-02
    7    1.536586e+00    0.506297    4.682189e-02
    8    7.252043e-01    0.471958    2.209798e-02
    9    3.086561e-01    0.425613    9.405177e-03
   10    1.326608e-01    0.429801    4.042358e-03
   11    5.976345e-02    0.450498    1.821075e-03
   12    2.878362e-02    0.481626    8.770766e-04
   13    1.354919e-02    0.470726    4.128625e-04
   14    6.019329e-03    0.444257    1.834173e-04
   15    2.137133e-03    0.355045    6.512140e-05
   16    7.655582e-04    0.358217    2.332762e-05
   17    2.357928e-04    0.308001    7.184932e-06
   18    8.526268e-05    0.361600    2.598072e-06
   19    2.900952e-05    0.340237    8.839601e-07
   20    1.060076e-05    0.365424    3.230199e-07
   21    4.034142e-06    0.380552    1.229259e-07
   22    1.785592e-06    0.442620    5.440946e-08
   23    7.381106e-07    0.413370    2.249125e-08
   24    2.675737e-07    0.362512    8.153340e-09
   25    8.445827e-08    0.315645    2.573560e-09
   26    2.718023e-08    0.321819    8.282193e-10
   27    9.884733e-09    0.363674    3.012015e-10
   28    3.446221e-09    0.348641    1.050111e-10
   29    1.263154e-09    0.366533    3.849006e-11

njsyw1997 avatar Aug 07 '22 23:08 njsyw1997

@njsyw1997, there's no limit on the number of threads that you can use in hypre, however the relaxation method that you are using (L1-Gauss-Seidel) may not converge with a large number of threads. Hypre generally gives better performance when partitioning your problem into several processes, i.e., using MPI instead of OpenMP for parallel runs. A mix between MPI and OpenMP is also fine as long as the number of OpenMP threads is not large. If you are unable to use MPI and want to use a large number of threads while maintaining good convergence, you can use a relaxation method such as Jacobi (7), L1-Jacobi (18), or Chebyshev (16).

victorapm avatar Aug 09 '22 14:08 victorapm

@njsyw1997, I'm closing this issue. Please, let us know if there are any other questions.

victorapm avatar Aug 22 '22 16:08 victorapm