[Issue]solution depends on the number of levels of AMG grid
solution depends on the number of levels of AMG grid
I am encountering a convergence issue when using AMGX to solve a system of equations. When working with smaller arrays and the AMG number of levels set to 1, the solver successfully converges.
Process 0 selecting device 0
AMGX version 2.2.0.132-opensource
Built on Nov 23 2023, 14:14:00
Compiled with CUDA Runtime 11.8, using CUDA driver 12.2
Cannot read file as JSON object, trying as AMGX config
Converting config string to current config version
Parsing configuration string: tolerance=1e-16 ;
Cannot read file as JSON object, trying as AMGX config
Converting config string to current config version
Parsing configuration string: exception_handling=1 ;
idx = 0, nnz = 0
Using Normal MPI (Hostbuffer) communicator...
AMG Grid:
Number of Levels: 1
LVL ROWS NNZ SPRSTY Mem (GB)
--------------------------------------------------------------
0(D) 227 901 0.0175 1.35e-05
--------------------------------------------------------------
Grid Complexity: 1
Operator Complexity: 1
Total Memory Usage: 1.34669e-05 GB
--------------------------------------------------------------
iter Mem Usage (GB) residual rate
--------------------------------------------------------------
Ini 1.84131 1.002957e-02
0 1.84131 3.473817e-17 0.0000
--------------------------------------------------------------
Total Iterations: 1
Avg Convergence Rate: 0.0000
Final Residual: 3.473817e-17
Total Reduction in Residual: 3.463574e-15
Maximum Memory Usage: 1.841 GB
--------------------------------------------------------------
Total Time: 0.00351504
setup: 0.00237533 s
solve: 0.00113971 s
solve(per iteration): 0.00113971 s
However, as soon as I slightly increase the size of the array and adjust the AMG number of levels to 2, the solver fails to converge.
AMGX version 2.2.0.132-opensource
Built on Nov 23 2023, 14:14:00
Compiled with CUDA Runtime 11.8, using CUDA driver 12.2
Cannot read file as JSON object, trying as AMGX config
Converting config string to current config version
Parsing configuration string: tolerance=1e-16 ;
Cannot read file as JSON object, trying as AMGX config
Converting config string to current config version
Parsing configuration string: exception_handling=1 ;
idx = 0, nnz = 0
Using Normal MPI (Hostbuffer) communicator...
AMG Grid:
Number of Levels: 2
LVL ROWS NNZ SPRSTY Mem (GB)
--------------------------------------------------------------
0(D) 229 909 0.0173 1.72e-05
1(D) 128 412 0.0251 1.3e-05
--------------------------------------------------------------
Grid Complexity: 1.55895
Operator Complexity: 1.45325
Total Memory Usage: 3.02382e-05 GB
--------------------------------------------------------------
iter Mem Usage (GB) residual rate
--------------------------------------------------------------
Ini 1.84131 1.002984e-02
0 1.84131 1.002984e-02 1.0000
1 1.8413 1.002984e-02 1.0000
2 1.8413 1.002984e-02 1.0000
3 1.8413 1.002984e-02 1.0000
4 1.8413 1.002984e-02 1.0000
5 1.8413 1.002984e-02 1.0000
6 1.8413 1.002984e-02 1.0000
7 1.8413 1.002984e-02 1.0000
8 1.8413 1.002984e-02 1.0000
9 1.8413 1.002984e-02 1.0000
10 1.8413 1.002984e-02 1.0000
11 1.8413 1.002984e-02 1.0000
12 1.8413 1.002984e-02 1.0000
13 1.8413 1.002984e-02 1.0000
14 1.8413 1.002984e-02 1.0000
15 1.8413 1.002984e-02 1.0000
16 1.8413 1.002984e-02 1.0000
17 1.8413 1.002984e-02 1.0000
18 1.8413 1.002984e-02 1.0000
19 1.8413 1.002984e-02 1.0000
20 1.8413 1.002984e-02 1.0000
21 1.8413 1.002984e-02 1.0000
22 1.8413 1.002984e-02 1.0000
23 1.8413 1.002984e-02 1.0000
24 1.8413 1.002984e-02 1.0000
25 1.8413 1.002984e-02 1.0000
26 1.8413 1.002984e-02 1.0000
27 1.8413 1.002984e-02 1.0000
28 1.8413 1.002984e-02 1.0000
29 1.8413 1.002984e-02 1.0000
30 1.8413 1.002984e-02 1.0000
31 1.8413 1.002984e-02 1.0000
32 1.8413 1.002984e-02 1.0000
33 1.8413 1.002984e-02 1.0000
34 1.8413 1.002984e-02 1.0000
35 1.8413 1.002984e-02 1.0000
36 1.8413 1.002984e-02 1.0000
37 1.8413 1.002984e-02 1.0000
38 1.8413 1.002984e-02 1.0000
39 1.8413 1.002984e-02 1.0000
40 1.8413 1.002984e-02 1.0000
41 1.8413 1.002984e-02 1.0000
42 1.8413 1.002984e-02 1.0000
43 1.8413 1.002984e-02 1.0000
44 1.8413 1.002984e-02 1.0000
45 1.8413 1.002984e-02 1.0000
46 1.8413 1.002984e-02 1.0000
47 1.8413 1.002984e-02 1.0000
48 1.8413 1.002984e-02 1.0000
49 1.8413 1.002984e-02 1.0000
50 1.8413 1.002984e-02 1.0000
51 1.8413 1.002984e-02 1.0000
52 1.8413 1.002984e-02 1.0000
53 1.8413 1.002984e-02 1.0000
54 1.8413 1.002984e-02 1.0000
55 1.8413 1.002984e-02 1.0000
56 1.8413 1.002984e-02 1.0000
57 1.8413 1.002984e-02 1.0000
58 1.8413 1.002984e-02 1.0000
59 1.8413 1.002984e-02 1.0000
60 1.8413 1.002984e-02 1.0000
61 1.8413 1.002984e-02 1.0000
62 1.8413 1.002984e-02 1.0000
63 1.8413 1.002984e-02 1.0000
64 1.8413 1.002984e-02 1.0000
65 1.8413 1.002984e-02 1.0000
66 1.8413 1.002984e-02 1.0000
67 1.8413 1.002984e-02 1.0000
68 1.8413 1.002984e-02 1.0000
69 1.8413 1.002984e-02 1.0000
70 1.8413 1.002984e-02 1.0000
71 1.8413 1.002984e-02 1.0000
72 1.8413 1.002984e-02 1.0000
73 1.8413 1.002984e-02 1.0000
74 1.8413 1.002984e-02 1.0000
75 1.8413 1.002984e-02 1.0000
76 1.8413 1.002984e-02 1.0000
77 1.8413 1.002984e-02 1.0000
78 1.8413 1.002984e-02 1.0000
79 1.8413 1.002984e-02 1.0000
80 1.8413 1.002984e-02 1.0000
81 1.8413 1.002984e-02 1.0000
82 1.8413 1.002984e-02 1.0000
83 1.8413 1.002984e-02 1.0000
84 1.8413 1.002984e-02 1.0000
85 1.8413 1.002984e-02 1.0000
86 1.8413 1.002984e-02 1.0000
87 1.8413 1.002984e-02 1.0000
88 1.8413 1.002984e-02 1.0000
89 1.8413 1.002984e-02 1.0000
90 1.8413 1.002984e-02 1.0000
91 1.8413 1.002984e-02 1.0000
92 1.8413 1.002984e-02 1.0000
93 1.8413 1.002984e-02 1.0000
94 1.8413 1.002984e-02 1.0000
95 1.8413 1.002984e-02 1.0000
96 1.8413 1.002984e-02 1.0000
97 1.8413 1.002984e-02 1.0000
98 1.8413 1.002984e-02 1.0000
99 1.8413 1.002984e-02 1.0000
--------------------------------------------------------------
Total Iterations: 100
Avg Convergence Rate: 1.0000
Final Residual: 1.002984e-02
Total Reduction in Residual: 1.000000e+00
Maximum Memory Usage: 1.841 GB
--------------------------------------------------------------
Total Time: 0.139064
setup: 0.00440851 s
solve: 0.134655 s
solve(per iteration): 0.00134655 s
Environment information:
- OS: Windows 11
- CUDA runtime: CUDA 11.
- AMGX version: v2.3.0
- NVIDIA driver: 12.2
- NVIDIA GPU: 3060ti
AMGX solver configuration
{ "config_version": 2, "solver": { "preconditioner": { "print_grid_stats": 1, "print_vis_data": 0, "solver": "AMG", "print_solve_stats": 0, "interpolator": "D2", "presweeps": 1, "max_iters": 1, "monitor_residual": 0, "store_res_history": 0, "scope": "amg", "cycle": "V", "postsweeps": 1 }, "solver": "FGMRES", "print_solve_stats": 1, "obtain_timings": 1, "max_iters": 100, "monitor_residual": 1, "gmres_n_restart": 20, "convergence": "RELATIVE_INI_CORE", "scope": "main", "tolerance": 0.0001, "norm": "L2" } }
Matrix Data
col_idxs.txt rhs.txt row_ptrs.txt values.txt Additionally, if you prefer, you can also use the following code to generate the matrix in CSR form and solve it using AMGX
Reproduction steps
Below is the code snippet. You can directly copy the entire code into a .c file and run it. In the file, you only need to adjust the value of nx. When nx = 113, AMGX automatically uses only one level, and the result converges; when nx = 114, the array size is only slightly larger, AMGX happens to split into 2 levels, but the results do not converge. Furthermore, as nx increases, regardless of the number of levels, it will not converge.
#include <math.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <mpi.h>
#include "cuda_runtime.h"
#include <stdint.h>
#define M_PI 3.14159265358979323846
/* CUDA error macro */
#define CUDA_SAFE_CALL(call) do { \
cudaError_t err = call; \
if(cudaSuccess != err) { \
fprintf(stderr, "Cuda error in file '%s' in line %i : %s.\n", \
__FILE__, __LINE__, cudaGetErrorString( err) ); \
exit(EXIT_FAILURE); \
} } while (0)
//#define AMGX_DYNAMIC_LOADING
//#undef AMGX_DYNAMIC_LOADING
#define MAX_MSG_LEN 4096
/* standard or dynamically load library */
#ifdef AMGX_DYNAMIC_LOADING
#include "amgx_capi.h"
#else
#include "amgx_c.h"
#endif
/* print error message and exit */
void errAndExit(const char* err)
{
printf("%s\n", err);
fflush(stdout);
MPI_Abort(MPI_COMM_WORLD, 1);
exit(1);
}
/* print callback (could be customized) */
void print_callback(const char* msg, int length)
{
int rank;
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
if (rank == 0) { printf("%s", msg); }
}
/* print usage and exit */
void printUsageAndExit()
{
char msg[MAX_MSG_LEN] = "Usage: mpirun [-n nranks] ./example [-mode [dDDI | dDFI | dFFI]] [-p nx ny] [-c config_file] [-amg \"variable1=value1 ... variable3=value3\"] [-gpu] [-it k]\n";
strcat(msg, " -mode: select the solver mode\n");
strcat(msg, " -p nx ny: select x- and y-dimensions of the 2D (5-points) local discretization of the Poisson operator (the global problem size will be nranks*nx*ny)\n");
strcat(msg, " -c: set the amg solver options from the config file\n");
strcat(msg, " -amg: set the amg solver options from the command line\n");
print_callback(msg, MAX_MSG_LEN);
MPI_Finalize();
exit(0);
}
/* parse parameters */
int findParamIndex(char** argv, int argc, const char* parm)
{
int count = 0;
int index = -1;
for (int i = 0; i < argc; i++)
{
if (strncmp(argv[i], parm, 100) == 0)
{
index = i;
count++;
}
}
if (count == 0 || count == 1)
{
return index;
}
else
{
char msg[MAX_MSG_LEN];
sprintf(msg, "ERROR: parameter %s has been specified more than once, exiting\n", parm);
print_callback(msg, MAX_MSG_LEN);
exit(1);
}
return -1;
}
double dhudx(double hw, double he, double uw, double ue, double dx)
{
return (he * ue - hw * uw) / dx;
}
double ududx(double uw, double up, double ue, double dx)
{
// up>=0 then return up*(up-uw)/dx, otherwise return up*(ue-up)/dx
// use fmax and fmin to avoid branching
return fmax(0., up) * (up - uw) / dx + fmin(0., up) * (ue - up) / dx;
}
double gdetadx(double gra, double etaw, double etae, double dx)
{
return gra * (etae - etaw) / dx;
}
double friction(double gra, double mann, double h, double u)
{
return gra * mann * mann / pow(h, 4.0 / 3.0) * fabs(u) * u;
}
void continuityXdirKernel(
int n,
int nx,
int ny,
double dx,
int* ndd,
double* eta,
double* u,
double* v,
double* h,
double* detadt,
void* values,
void* rhs,
int* row_ptrs) {
/* -------------------- continuity equation ---------------------------
-----------------------detadt = dhu/dx + dhv/dy------------------------
detadt is constant, so it is not included in the matrix. In other words,
the pivot element is 0 for the continuity equation.
-------------------------------------------------------------------- */
for (int i = 0; i < n; i++) {
int row = i / nx; // row index in the 2D work array
int col = i % nx; // col index in the 2D work array
int idx = 2 * col + 1 + (2 * nx + 1) * row; // pivot index in the row of sparse matrix
int nnz = row_ptrs[idx]; // index of the first non-zero element in the row
// open boundary condition for water level
if (ndd[i] == 2) {
((double*)values)[nnz] = 1.0;
((double*)rhs)[idx] = - eta[i];
}
// inner points
else {
int iu = col + row * (nx + 1); // index of u located at the left edge of the cell
double dhudx0 = dhudx(h[iu], h[iu + 1], u[iu], u[iu + 1], dx);
// left u coefficient
((double*)values)[nnz] = -h[iu] / dx;
// midddle eta coefficient
((double*)values)[nnz + 1] = 2. * fabs(dhudx0 - detadt[i]);
// right u coefficient
((double*)values)[nnz + 2] = h[iu + 1] / dx;
//rhs values
((double*)rhs)[idx] = - dhudx0 + detadt[i];
}
}
}
void momentumXdirKernel(
int n,
int nx,
int ny,
double dx,
double gra,
double mann,
double Du,
int* ndd,
double* eta,
double* u,
double* v,
double* h,
void* values,
void* rhs,
int* row_ptrs) {
/* -------------------- momentum equation u ---------------------------
* 0 = g * mann^2 / h^(4/3) * |uv| * u // friction term
+ udu/dx + vdu/dy // advection term
g * deta/dx // gravity term
- d/dx (Du * du/dx) // diffusion term
-------------------------------------------------------------------- */
double epsu = 1e-7;
double epse = 1e-9;
//int i = blockIdx.x * blockDim.x + threadIdx.x;
for (int i = 0; i < n + ny; i++) {
int row = i / (nx + 1); // row index in the 2D work array
int col = i % (nx + 1); // col index in the 1D work array
int idx = 2 * col + (2 * nx + 1) * row; // pivot index in the row of sparse matrix
int nnz = row_ptrs[idx]; // index of the first non-zero element in the row
int ie = col + row * nx; // index of eta located at the center of the cell
// inner points
if (col > 0 && col < nx) {
double adv = ududx(u[i - 1], u[i], u[i + 1], dx);
double slp = gdetadx(gra, eta[ie - 1], eta[ie], dx);
double frc = friction(gra, mann, h[i], u[i]);
double advw = ududx(u[i - 1] + epsu, u[i], u[i + 1], dx);
double advp = ududx(u[i - 1], u[i] + epsu, u[i + 1], dx);
double adve = ududx(u[i - 1], u[i], u[i + 1] + epsu, dx);
double slpw = gdetadx(gra, eta[ie - 1] + epse, eta[ie], dx);
double slpe = gdetadx(gra, eta[ie - 1], eta[ie] + epse, dx);
double frcp = friction(gra, mann, h[i], u[i] + epsu);
// left U
((double*)values)[nnz] = 1. / epsu * (advw - adv);
// left eta
((double*)values)[nnz + 1] = 1. / epse * (slpw - slp);
// center U
((double*)values)[nnz + 2] = 1. / epsu * (advp - adv)
+ 1. / epsu * (frcp - frc)
+ 2. * fabs(adv + slp + frc);
// right eta
((double*)values)[nnz + 3] = 1. / epse * (slpe - slp);
// right U
((double*)values)[nnz + 4] = 1. / epsu * (adve - adv);
//rhs values
((double*)rhs)[idx] = - adv - slp - frc;
}
// left boundary
else if (col == 0) {
// open boundary condition
if (ndd[ie] == 2) {
// center U
((double*)values)[nnz] = 1.;
// right U
((double*)values)[nnz + 1] = - 2.;
// right right U
((double*)values)[nnz + 2] = 1.;
//rhs values
((double*)rhs)[idx] = 2. * u[i + 1] - u[i] - u[i + 2];
}
// closed boundary condition
else {
// center U
((double*)values)[nnz] = 1.;
//rhs values
((double*)rhs)[idx] = - u[i];
}
}
// right boundary
else if (col == nx) {
// open boundary condition
if (ndd[ie - 1] == 2) {
// left left U
((double*)values)[nnz] = 1.;
// left U
((double*)values)[nnz + 1] = -2.;
// center U
((double*)values)[nnz + 2] = 1.;
//rhs values
((double*)rhs)[idx] = 2. * u[i - 1] - u[i] - u[i - 2];
}
// closed boundary condition
else {
// center U
((double*)values)[nnz] = 1.;
//rhs values
((double*)rhs)[idx] = - u[i];
}
}
}
}
//
int main(int argc, char** argv)
{
//parameter parsing
int pidx = 0;
int pidy = 0;
//MPI (with CUDA GPUs)
int rank = 0;
int lrank = 0;
int nranks = 0;
int n;
int nx, ny;
int gpu_count = 0;
MPI_Comm amgx_mpi_comm = MPI_COMM_WORLD;
//versions
int major, minor;
char* ver, * date, * time;
//input matrix and rhs/solution
int* partition_sizes = NULL;
int* partition_vector = NULL;
int partition_vector_size = 0;
//library handles
AMGX_Mode mode;
AMGX_config_handle cfg;
AMGX_resources_handle rsrc;
AMGX_matrix_handle A;
AMGX_vector_handle b, x;
AMGX_solver_handle solver;
//status handling
AMGX_SOLVE_STATUS status;
/* MPI init (with CUDA GPUs) */
//MPI
MPI_Init(&argc, &argv);
MPI_Comm_size(amgx_mpi_comm, &nranks);
MPI_Comm_rank(amgx_mpi_comm, &rank);
//CUDA GPUs
CUDA_SAFE_CALL(cudaGetDeviceCount(&gpu_count));
lrank = rank % gpu_count;
CUDA_SAFE_CALL(cudaSetDevice(lrank));
printf("Process %d selecting device %d\n", rank, lrank);
/* load the library (if it was dynamically loaded) */
#ifdef AMGX_DYNAMIC_LOADING
void* lib_handle = NULL;
#ifdef _WIN32
lib_handle = amgx_libopen("amgxsh.dll");
#else
lib_handle = amgx_libopen("libamgxsh.so");
#endif
if (lib_handle == NULL)
{
errAndExit("ERROR: can not load the library");
}
//load all the routines
if (amgx_liblink_all(lib_handle) == 0)
{
amgx_libclose(lib_handle);
errAndExit("ERROR: corrupted library loaded\n");
}
#endif
/* init */
AMGX_SAFE_CALL(AMGX_initialize());
AMGX_SAFE_CALL(AMGX_initialize_plugins());
/* system */
AMGX_SAFE_CALL(AMGX_register_print_callback(&print_callback));
AMGX_SAFE_CALL(AMGX_install_signal_handler());
/* get api and build info */
if ((pidx = findParamIndex(argv, argc, "--version")) != -1)
{
AMGX_get_api_version(&major, &minor);
printf("amgx api version: %d.%d\n", major, minor);
AMGX_get_build_info_strings(&ver, &date, &time);
printf("amgx build version: %s\nBuild date and time: %s %s\n", ver, date, time);
AMGX_SAFE_CALL(AMGX_finalize_plugins());
AMGX_SAFE_CALL(AMGX_finalize());
/* close the library (if it was dynamically loaded) */
#ifdef AMGX_DYNAMIC_LOADING
amgx_libclose(lib_handle);
#endif
MPI_Finalize();
exit(0);
}
mode = AMGX_mode_dDDI;
int sizeof_m_val = ((AMGX_GET_MODE_VAL(AMGX_MatPrecision, mode) == AMGX_matDouble)) ? sizeof(double) : sizeof(float);
int sizeof_v_val = ((AMGX_GET_MODE_VAL(AMGX_VecPrecision, mode) == AMGX_vecDouble)) ? sizeof(double) : sizeof(float);
AMGX_SAFE_CALL(AMGX_config_create_from_file(&cfg, "FGMRES.json"));
AMGX_SAFE_CALL(AMGX_config_add_parameters(&cfg, "tolerance=1e-16"));
/* example of how to handle errors */
//char msg[MAX_MSG_LEN];
//AMGX_RC err_code = AMGX_resources_create(NULL, cfg, &amgx_mpi_comm, 1, &lrank);
//AMGX_SAFE_CALL(AMGX_get_error_string(err_code, msg, MAX_MSG_LEN));
//printf("ERROR: %s\n",msg);
/* switch on internal error handling (no need to use AMGX_SAFE_CALL after this point) */
AMGX_SAFE_CALL(AMGX_config_add_parameters(&cfg, "exception_handling=1"));
/* create resources, matrix, vector and solver */
AMGX_resources_create(&rsrc, cfg, &amgx_mpi_comm, 1, &lrank);
AMGX_matrix_create(&A, rsrc, mode);
AMGX_vector_create(&x, rsrc, mode);
AMGX_vector_create(&b, rsrc, mode);
AMGX_solver_create(&solver, rsrc, mode, cfg);
//generate 3D Poisson matrix, [and rhs & solution]
//WARNING: use 1 ring for aggregation and 2 rings for classical path
int nrings; //=1; //=2;
AMGX_config_get_default_number_of_rings(cfg, &nrings);
//printf("nrings=%d\n",nrings);
int nglobal = 0;
nx = 114;
ny = 1;
n = nx * ny;
nglobal = n * nranks;
/* generate the matrix
In more detail, this routine will create 2D (5 point) discretization of the
Poisson operator. The discretization is performed on a the 2D domain consisting
of nx and ny points in x- and y-dimension respectively. Each rank processes it's
own part of discretization points. Finally, the rhs and solution will be set to
a vector of ones and zeros, respectively. */
int* row_ptrs = (int*)malloc((2 * n + ny + 1) * sizeof(int));
int64_t* col_idxs = (int64_t*)malloc(6 * (2 * n + ny) * sizeof(int64_t));
void* values = malloc(6 * (2 * n + ny) * sizeof(double)); // maximum nnz
int nnz = 0;
int64_t count = 0;
int64_t start_idx = rank * n;
///////////////////////////////////////////////////////
///////////////////////////////////////////////////////
double dx = .5;
double gra = 9.81;
double mann = 0.012;
double Du = 0.5;
double Trange = 1.0;
double Tperiod = 43200.0;
int* ndd;
double* u;
double* v;
double* eta;
double* h;
double* detadt;
void* rhs;
void* h_x;
// Allocate memory on host
ndd = (int*)malloc(n * sizeof(int));
u = (double*)malloc((n + ny) * sizeof(double));
v = (double*)malloc((n + nx) * sizeof(double));
eta = (double*)malloc(n * sizeof(double));
h = (double*)malloc((n + ny)* sizeof(double));
detadt = (double*)malloc(n * sizeof(double));
rhs = malloc((2 * n + ny) * sizeof(double));
h_x = malloc((2 * n + ny) * sizeof(double));
memset(h_x, 0., (2 * n + ny) * sizeof(double));
//for (int i = 0; i < 2 * n + ny; i++) {
// if (i % 2 == 0) {
// ((double*)h_x)[i] = 0.0;
// }
// else {
// ((double*)h_x)[i] = 0.0;
// }
// }
// Initialize host memory
for (int i = 0; i < n; i++) {
int row = i / nx; // row index in the 2D work array
int col = i % nx; // col index in the 1D work array
if (col == 0) {
ndd[i] = 2;
}
else {
ndd[i] = 0;
}
eta[i] = 0.0;
detadt[i] = M_PI * Trange / Tperiod;
}
for (int i = 0; i < n + ny; i++) {
h[i] = 2.0;
}
for (int i = 0; i < n + nx; i++) {
v[i] = 0.0;
}
// set random value to u array
for (int i = 0; i < n + ny; i++) {
u[i] = 0.;
u[i] = 0.01;
//u[i] = (double)rand() / RAND_MAX * 2. - 1.;
}
// initialize sparse matrix (determine row_ptrs and col_idxs)
nnz = 0;
for (int row = 0; row < ny; row++) {
for (int i = 0; i < 2 * nx + 1; i++) {
// continuity equation
if (i % 2 == 1) {
int col = (i - 1) / 2;
int idx = i + (2 * nx + 1) * row;
int i0 = col + row * nx;
row_ptrs[idx] = nnz;
// open boundary
if (ndd[i0] == 2) {
col_idxs[nnz++] = idx;
}
// inner points
else {
col_idxs[nnz++] = idx - 1;
col_idxs[nnz++] = idx;
col_idxs[nnz++] = idx + 1;
}
}
// momentum equation
if (i % 2 == 0) {
int col = i / 2;
int idx = i + (2 * nx + 1) * row;
int i0 = col + row * nx;
row_ptrs[idx] = nnz;
if (col > 0 && col < nx) {
col_idxs[nnz++] = idx - 2;
col_idxs[nnz++] = idx - 1;
col_idxs[nnz++] = idx;
col_idxs[nnz++] = idx + 1;
col_idxs[nnz++] = idx + 2;
}
else if (col == 0) {
if (ndd[i0] == 2) {
printf("idx = %d, nnz = %d\n", idx, nnz);
col_idxs[nnz++] = idx;
col_idxs[nnz++] = idx + 2;
col_idxs[nnz++] = idx + 4;
}
else {
col_idxs[nnz++] = idx;
}
}
else if (col == nx) {
if (ndd[i0 - 1] == 2) {
col_idxs[nnz++] = idx - 4;
col_idxs[nnz++] = idx - 2;
col_idxs[nnz++] = idx;
}
else {
col_idxs[nnz++] = idx;
}
}
}
}
}
row_ptrs[2 * n + ny] = nnz;
FILE* fp;
// loop over 100
for (int ik = 0; ik < 1; ik++) {
continuityXdirKernel(n, nx, ny, dx, ndd, eta, u, v, h, detadt, (double*)values, rhs, row_ptrs);
momentumXdirKernel(n, nx, ny, dx, gra, mann, Du, ndd, eta, u, v, h, (double*)values, rhs, row_ptrs);
///////////////////////////////////////////////////////
///////////////////////////////////////////////////////
fp = fopen("values.bin", "wb");
fwrite(values, sizeof(double), 6 * (2 * n + ny), fp);
fclose(fp);
fp = fopen("rhs.bin", "wb");
fwrite(rhs, sizeof(double), (2 * n + ny), fp);
fclose(fp);
fp = fopen("row_ptrs.bin", "wb");
fwrite(row_ptrs, sizeof(int), (2 * n + ny + 1), fp);
fclose(fp);
fp = fopen("col_idxs.bin", "wb");
fwrite(col_idxs, sizeof(int64_t), 6 * (2 * n + ny), fp);
fclose(fp);
FILE* fp2;
// write the matrix to a text file
fp2 = fopen("values.txt", "w");
for (int i = 0; i < 6 * (2 * n + ny); i++) {
fprintf(fp2, "%f\n", ((double*)values)[i]);
}
fclose(fp2);
fp2 = fopen("rhs.txt", "w");
for (int i = 0; i < 2 * n + ny; i++) {
fprintf(fp2, "%f\n", ((double*)rhs)[i]);
}
fclose(fp2);
fp2 = fopen("row_ptrs.txt", "w");
for (int i = 0; i < 2 * n + ny + 1; i++) {
fprintf(fp2, "%d\n", row_ptrs[i]);
}
fclose(fp2);
fp2 = fopen("col_idxs.txt", "w");
for (int i = 0; i < 6 * (2 * n + ny); i++) {
fprintf(fp2, "%d\n", col_idxs[i]);
}
fclose(fp2);
AMGX_matrix_upload_all_global(A,
2 * n + ny, 2 * n + ny, nnz, 1, 1,
row_ptrs, col_idxs, values, NULL,
nrings, nrings, NULL);
/* generate the rhs and solution */
//for (int i = 0; i < 2 * n + ny; i++) {
// if (i % 2 == 0) {
// ((double*)h_x)[i] = 0.0;
// }
// else {
// ((double*)h_x)[i] = 0.0;
// }
//}
/* set the connectivity information (for the vector) */
AMGX_vector_bind(x, A);
AMGX_vector_bind(b, A);
/* upload the vector (and the connectivity information) */
AMGX_vector_upload(x, 2 * n + ny, 1, h_x);
AMGX_vector_upload(b, 2 * n + ny, 1, rhs);
/* solver setup */
//MPI barrier for stability (should be removed in practice to maximize performance)
MPI_Barrier(amgx_mpi_comm);
AMGX_solver_setup(solver, A);
/* solver solve */
//MPI barrier for stability (should be removed in practice to maximize performance)
MPI_Barrier(amgx_mpi_comm);
AMGX_solver_solve(solver, b, x);
/* example of how to change parameters between non-linear iterations */
//AMGX_config_add_parameters(&cfg, "config_version=2, default:tolerance=1e-12");
//AMGX_solver_solve(solver, b, x);
/* example of how to replace coefficients between non-linear iterations */
//AMGX_matrix_replace_coefficients(A, n, nnz, values, diag);
//AMGX_solver_setup(solver, A);
//AMGX_solver_solve(solver, b, x);
AMGX_solver_get_status(solver, &status);
/* example of how to get (the local part of) the solution */
//int sizeof_v_val;
//sizeof_v_val = ((NVAMG_GET_MODE_VAL(NVAMG_VecPrecision, mode) == NVAMG_vecDouble))? sizeof(double): sizeof(float);
//
void* SLN = malloc((2 * n + ny) * sizeof(double));
AMGX_vector_download(x, SLN);
// update SLN to eta
for (int i = 0; i < n; i++) {
int row = i / nx; // row index in the 2D work array
int col = i % nx; // col index in the 2D work array
int idx = 2 * col + 1 + (2 * nx + 1) * row; // pivot index in the row of sparse matrix
eta[i] = eta[i] + ((double*)SLN)[idx];
}
// update SLN to u
for (int i = 0; i < n + ny; i++) {
int row = i / (nx + 1); // row index in the 2D work array
int col = i % (nx + 1); // col index in the 2D work array
int idx = 2 * col + (2 * nx + 1) * row; // pivot index in the row of sparse matrix
u[i] = u[i] + ((double*)SLN)[idx];
}
}
//write u and eta to binary file
fp = fopen("u.bin", "wb");
fwrite(u, sizeof(double), (n + ny), fp);
fclose(fp);
// write eta to binry file
fp = fopen("eta.bin", "wb");
fwrite(eta, sizeof(double), n, fp);
fclose(fp);
free(values);
free(row_ptrs);
free(col_idxs);
//free(result_host);
/* destroy resources, matrix, vector and solver */
AMGX_solver_destroy(solver);
AMGX_vector_destroy(x);
AMGX_vector_destroy(b);
AMGX_matrix_destroy(A);
AMGX_resources_destroy(rsrc);
/* destroy config (need to use AMGX_SAFE_CALL after this point) */
AMGX_SAFE_CALL(AMGX_config_destroy(cfg))
/* shutdown and exit */
AMGX_SAFE_CALL(AMGX_finalize_plugins())
AMGX_SAFE_CALL(AMGX_finalize())
/* close the library (if it was dynamically loaded) */
#ifdef AMGX_DYNAMIC_LOADING
amgx_libclose(lib_handle);
#endif
MPI_Finalize();
CUDA_SAFE_CALL(cudaDeviceReset());
return status;
}
Additional context
Add any other context about the problem here.
Hello @fxuecnu ,
To explain the difference between those two runs - In the first case with 1 AMG level, AMGX uses direct solver for the coarsest level (which is the only level), which solves system directly. However in the second case, It seems that multigrid operators doesn't perform well - it might be related to the fact that off-diagonal elements are significantly larger than diagonal ones. Do you see similar behaviour with larger matrices? If you need to solve only such small matrices I would recommend direct solve approach (i.e. factorize + solve)
Hello @marsaev
Thank you for your response. Indeed, my matrices exhibit the scenario you described, with very small diagonal elements, resulting in convergence issues even with larger matrices. I need to solve extremely large matrices up to the size of 4 million by 4 million. I have tried direct solving, but it is too slow. Currently, I upload the matrices to AMGX with a block size of 1x1. I could upload the elements in 3x3 blocks to ensure diagonal dominance at the block level. Does AMGX have a suitable algorithm to handle this approach? Additionally, are there any other methods within AMGX that could leverage its high performance capabilities for my situation?
Hello marsaev
Thank you for your response. Indeed, my matrices exhibit the scenario you described, with very small diagonal elements, resulting in convergence issues even with larger matrices. I need to solve extremely large matrices up to the size of 4 million by 4 million. I have tried direct solving, but it is too slow. Currently, I upload the matrices to AMGX with a block size of 1x1. I could upload the elements in 3x3 blocks to ensure diagonal dominance at the block level. Does AMGX have a suitable algorithm to handle this approach? Additionally, are there any other methods within AMGX that could leverage its high performance capabilities for my situation?
best, Fan
徐凡 副研究员 华东师范大学 @.*** 18851122579 上海市东川路500号河口海岸大楼
签名由 网易灵犀办公 定制
Original: From:marsaev @.>Date:2024-07-04 07:45:12(中国 (GMT+08:00))To:NVIDIA/AMGX @.>Cc:fxuecnu @.> , Mention @.>Subject:Re: [NVIDIA/AMGX] [Issue]solution depends on the number of levels of AMG grid (Issue #305) Hello @fxuecnu , To explain the difference between those two runs - In the first case with 1 AMG level, AMGX uses direct solver for the coarsest level (which is the only level), which solves system directly. However in the second case, It seems that multigrid operators doesn't perform well - it might be related to the fact that off-diagonal elements are significantly larger than diagonal ones. Do you see similar behaviour with larger matrices? If you need to solve only such small matrices I would recommend direct solve approach (i.e. factorize + solve) — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>
@fxuecnu How would you solve this system on CPU? You might want to try GMRES type of solver, but maybe multigrid wouldn't be a best preconditioner in this case.
If direct sparse solve works for you but you find it slow, I can also suggest trying https://developer.nvidia.com/cudss