bugno comments

Results 25 comments of


                                            bugno

Optimize `op_conv_vef_face` kernel

On MI250X AMD, the **slowdown** with scratch memory is between 7% and 25% (warp size 64?).

[Build] v2.4.0 with Cuda 11.0

Then, errors during example builds: **VERBOSE=1 make** cd /ccc/scratch/cont002/den/ledacp/trust/amgx_openmp_int64/ThirdPart/src/LIBAMGX/AmgX/build/examples && /ccc/products2/cmake-3.22.2/Rhel_8__x86_64/system/default/bin/cmake -E cmake_link_script CMakeFiles/amgx_mpi_capi_agg.dir/link.txt --verbose=1 /ccc/products/gcc-8.3.0/system/default/bin/gcc -DRAPIDJSON_DEFINED -DAMGX_WITH_MPI -O3 -DNDEBUG -L/ccc/products2/openmpi-4.1.4.6/Rhel_8__x86_64/gcc--8.3.0/default/lib -L/ccc/products2/hwloc-2.5.0/Rhel_8__x86_64/system/cuda-11.6/lib -L/ccc/products2/openmpi-4.1.4.6/Rhel_8__x86_64/gcc--8.3.0/default/lib -L/ccc/products2/hwloc-2.5.0/Rhel_8__x86_64/system/cuda-11.6/lib CMakeFiles/amgx_mpi_capi_agg.dir/amgx_mpi_capi_agg.c.o -o amgx_mpi_capi_agg /ccc/products/openmpi-4.1.4/gcc--8.3.0/default/lib/libmpi.so ../libamgxsh.so -lrt...

[Build] v2.4.0 with Cuda 11.0

Thanks Matt.

MPI_Direct benefits ?

Here are the logs of two runs, one (Left) no MPI_Direct, second (Right) MPI_Direct enabled: ``` Current_scope:parameter_name(new_scope) = parameter_value Current_scope:parameter_name(new_scope) = parameter_value > default:communicator = MPI_DIRECT default:exception_handling = 1 default:exception_handling...

[MultiGPU] No convergence for Classical AMG with Cuda version>11.2 in my code when linked with AmgX (not reproduced in AmgX standalone)

We are using AmgX (through AmgXWrapper) for 2 years now, but we are facing an annoying issue. Our code runs fine with CG solver and Aggegated or Classical AMG preconditioner,...

[MultiGPU] No convergence for Classical AMG with Cuda version>11.2 in my code when linked with AmgX (not reproduced in AmgX standalone)

The issue seems to come from cuSPARSE library, cause with Cuda>11.2 and using only 11.2 version for libcusparse, it works. So putting libcusparse.so.11.3.1.68 (as libcusparse.so.11) along libamgxsh.so in the same...

[MultiGPU] No convergence for Classical AMG with Cuda version>11.2 in my code when linked with AmgX (not reproduced in AmgX standalone)

Thanks Matt for the replay, ready to discuss privately and share my use case. In the same time, I will try PMIS with Classical AMG. I forgot to say I...

[MultiGPU] No convergence for Classical AMG with Cuda version>11.2 in my code when linked with AmgX (not reproduced in AmgX standalone)

> Did you happen to try PMIS instead of HMIS? PMIS is same than HMIS for this issue.

[MultiGPU] No convergence for Classical AMG with Cuda version>11.2 in my code when linked with AmgX (not reproduced in AmgX standalone)

I mean that: a) The issue (C-AMG with MultiGPU on Cuda>11.2) happens in my code with every different kind of matrix b) I can't reproduce the issue when providing with...

[MultiGPU] No convergence for Classical AMG with Cuda version>11.2 in my code when linked with AmgX (not reproduced in AmgX standalone)

> What alarms me is that, if i understand correctly, changing cusparse library changes the behaviour. Yes using CUDA 11.2 cusparse (with LD_LIBRARY_PATH) CHANGES the behaviour in my case, I...