bug with aggressive_levels=1
Dear developers, I want to report a bug that I found on a100 once aggressive_levels=1. Here is the message AMGX version 2.4.0 Built on Nov 22 2022, 14:07:12 Compiled with CUDA Runtime 11.5, using CUDA driver 11.6 The AMGX_initialize_plugins API call is deprecated and can be safely removed. Cannot read file as JSON object, trying as AMGX config Converting config string to current config version Parsing configuration string: selector=AGGRESSIVE_PMIS ; Thrust failure: trivial_device_copy D->H failed: cudaErrorIllegalAddress: an illegal memory access was encountered File and line number are not available for this exception. Caught amgx exception: Error, setup must be called before calling solve at: /home/nvarini/arm-gnu-stack/AMGX/src/solvers/solver.cu:599 Stack trace:
I am using 1) gcc/11.2.0-cuda 2) openmpi/4.1.2-cuda 3) cuda/11.5.1 4) hdf5/1.13.0-mpi At the link https://drive.google.com/file/d/1U96S96UIXSA6dAfhuqLGVlGTM2zdH1_p/view?usp=share_link there is a minimalistic problem that reproduce the error. If aggressive_levels=0 the application run correctly. This bug doesn't appear on V100.
Regards
Hi @nvarini ,
I don't have fortran environment, can i ask you to provide data in AMGX-readable format? You can generate such matrix file by using this API https://github.com/NVIDIA/AMGX/blob/main/include/amgx_c.h#L441-L445 after you have uploaded your data to AMGX handles. If your right hand side is empty handle it will assume it's a vector of ones, if initial solution is empty it will assume it's vector of zeros.
Hi @marsaev, here is the file https://drive.google.com/file/d/1yHkYFPUlKgf7ONnG6EBTRlf36BIn8ZfA/view?usp=share_link Let me know if you need anything else.
Hi @marsaev, do you have any update on this issue? Thanks!
@nvarini can we close this issue? It looks like from latest tests it is resolved.
In principle yes, although Leonardo is still not accessible and I'd like to verify there. Perhaps we can close it and reopen if the issue persists?
Il giorno ven 23 giu 2023 alle ore 11:26 Filippo Spiga < @.***> ha scritto:
@nvarini https://github.com/nvarini can we close this issue? It looks like from latest tests it is resolved.
— Reply to this email directly, view it on GitHub https://github.com/NVIDIA/AMGX/issues/222#issuecomment-1603999642, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACY7AOE654O4QA73P7LSRDLXMVOLFANCNFSM6AAAAAASH27EPA . You are receiving this because you were mentioned.Message ID: @.***>
We can wait to close, no prob. Keep up updated on progress.
Any luck @nvarini ?