CUDALibrarySamples icon indicating copy to clipboard operation
CUDALibrarySamples copied to clipboard

cg_example error

Open jlxy11 opened this issue 1 year ago • 6 comments

### Tasks

jlxy11 avatar Feb 25 '24 12:02 jlxy11

matrix name: parabolic_fem.mtx num. rows: 525825 num. cols: 525825 nnz: 4200450 structure: symmetric

Matrix parsing... Testing CG cuSPARSE API failed at line 517 with error: zero pivot (9)

jlxy11 avatar Feb 25 '24 12:02 jlxy11

@jlxy11 It looks like this is not a universal problem, but depends on some combination of environmental factors. Please provide information about your environment. What version of the toolkit are you using, which compiler, compiler flags, what CUDA hardware, which OS, which drivers, etc.

essex-edwards avatar Feb 26 '24 20:02 essex-edwards

@essex-edwards Thank you for your reply. I built the program on vs2019. The operating system is win11. The cuda package version used is 12.0. The operating hardware is RTX 4080 laptop. No other changes were made except linking the cusparse and cublas libraries.@

@jlxy11 It looks like this is not a universal problem, but depends on some combination of environmental factors. Please provide information about your environment. What version of the toolkit are you using, which compiler, compiler flags, what CUDA hardware, which OS, which drivers, etc.

jlxy11 avatar Feb 27 '24 13:02 jlxy11

@essex-edwards Thank you for your reply. I built the program on vs2019. The operating system is win11. The cuda package version used is 12.0. The operating hardware is RTX 4080 laptop. No other changes were made except linking the cusparse and cublas libraries.@

@jlxy11 It looks like this is not a universal problem, but depends on some combination of environmental factors. Please provide information about your environment. What version of the toolkit are you using, which compiler, compiler flags, what CUDA hardware, which OS, which drivers, etc.

In addition, I made a little change to the code and moved part of the code as shown in the picture outside the mtx_parsing function, because the original code reported an error here 1709042556319

jlxy11 avatar Feb 27 '24 14:02 jlxy11

@jlxy11 I have reproduced the error you are seeing. I get a zero pivot on line 532, not line 517, but it seems likely that we are encountering the same error. I don't have a fix or a workaround for you. Thank you for the bug report.

Assorted details below:

I'm using MSVC 2022 (Version 17.9.2), Windows 11, Toolkit version 12.3, and a laptop with an RTX 3500. This is a little different from your setup. It might explain why the reported line number is different.

I had to do the same change to IdxType and sort_by_row that you did. I also had to change the CMake:

 target_link_libraries(${ROUTINE}_example
-    PUBLIC cudart cusparse cublas
+    PUBLIC CUDA::cudart CUDA::cusparse CUDA::cublas
 )

and replace fseek with get/unget

@@ -96,6 +96,16 @@ typedef struct VecStruct {

 //==============================================================================

+int fpeek(FILE* stream)
+{
+    int c;
+    c = fgetc(stream);
+    ungetc(c, stream);
+    return c;
+}
 void mtx_header(const char* file_path,
                 int*        num_lines,
                 int*        num_rows,
@@ -123,14 +133,21 @@ void mtx_header(const char* file_path,
     }
     token = strtok(NULL, " \n"); // symmetric, unsymmetric
     *is_symmetric = (strcmp(token, "symmetric") == 0);
-    while (fgetc(file) == '%')
+    while (fpeek(file) == '%')
         fgets(buffer, 256, file); // skip % comments
-    fseek(file, -1, SEEK_CUR);
     fscanf(file, "%d %d %d", num_rows, num_cols, num_lines);
     *nnz = (*is_symmetric) ? *num_lines * 2 : *num_lines;
     fclose(file);
 }

essex-edwards avatar Mar 01 '24 17:03 essex-edwards

@jlxy11 I have reproduced the error you are seeing. I get a zero pivot on line 532, not line 517, but it seems likely that we are encountering the same error. I don't have a fix or a workaround for you. Thank you for the bug report.

Assorted details below:

I'm using MSVC 2022 (Version 17.9.2), Windows 11, Toolkit version 12.3, and a laptop with an RTX 3500. This is a little different from your setup. It might explain why the reported line number is different.

I had to do the same change to IdxType and sort_by_row that you did. I also had to change the CMake:

 target_link_libraries(${ROUTINE}_example
-    PUBLIC cudart cusparse cublas
+    PUBLIC CUDA::cudart CUDA::cusparse CUDA::cublas
 )

and replace fseek with get/unget

@@ -96,6 +96,16 @@ typedef struct VecStruct {

 //==============================================================================

+int fpeek(FILE* stream)
+{
+    int c;
+    c = fgetc(stream);
+    ungetc(c, stream);
+    return c;
+}
 void mtx_header(const char* file_path,
                 int*        num_lines,
                 int*        num_rows,
@@ -123,14 +133,21 @@ void mtx_header(const char* file_path,
     }
     token = strtok(NULL, " \n"); // symmetric, unsymmetric
     *is_symmetric = (strcmp(token, "symmetric") == 0);
-    while (fgetc(file) == '%')
+    while (fpeek(file) == '%')
         fgets(buffer, 256, file); // skip % comments
-    fseek(file, -1, SEEK_CUR);
     fscanf(file, "%d %d %d", num_rows, num_cols, num_lines);
     *nnz = (*is_symmetric) ? *num_lines * 2 : *num_lines;
     fclose(file);
 }

Thank you very much for your reply. As you mentioned, I made two changes to this code. I commented out the line: fseek(file, -1, SEEK_CUR); in the mtx_header function, so that the mtx_header function test can be passed correctly. , and will not affect the mtx_parsing function, but I don’t understand why these two problems occur. I will try to solve this problem according to the method you provided later. Finally, thank you again for your patient answer! @essex-edwards

jlxy11 avatar Mar 02 '24 04:03 jlxy11

@jlxy11 We updated the cg_example in this commit https://github.com/NVIDIA/CUDALibrarySamples/commit/9a7897fb0c4f4a718178b310fa4f0034451e8a14 . The example should work now, without a zero pivot error.

essex-edwards avatar May 08 '24 17:05 essex-edwards