AMGX Caught amgx exception: Cannot allocate pinned memory

I am using amgx with PETSc (-pc_type amgx) to run multiphysics simulations. I am encountering this error even after having scaled down the problem size significantly.

Caught amgx exception: Cannot allocate pinned memory

I have attached the output and error log files for your reference. Thank you for any feedback you can provide.

outlog.docx errlog.docx

Environment information:

MINT OS
CUDA runtime 12.3
OpenMPI 4
AMGX version 2.4
CUDA driver 12.4
NVIDIA V100

Jun 14 '24 17:06 AnjaliSandip

@AnjaliSandip It seems error indicates that pinned memory pool cannot be allocated:

Caught amgx exception: Cannot allocate pinned memory
 at: /home/anjali.sandip/ISSM/ISSM/externalpackages/petsc/src/arch-linux-c-opt/externalpackages/git.amgx/src/global_thread_handle.cu:374

It's size is currently fixed to 100 MB: https://github.com/NVIDIA/AMGX/blob/v2.4.0/src/global_thread_handle.cu#L51 regardless of the input data ( and this allocation happens during resources creation at which point we don't know problem size)

Is your process allowed to allocate page-locked memory? (i.e. for docker containers you have to provide respective ulimit flag, i.e.: --ulimit memlock=-1)

Jun 15 '24 00:06 marsaev

Thank you for your response. I am using PETSc with AMGX interface. PETSc has this option of setting the minimum data size for which pinned memory will be used for host (CPU) allocations.

#include "petscvec.h"
VecSetPinnedMemoryMin (Vec v, size_t mbytes)

Is this what you are referring to?

On Fri, Jun 14, 2024 at 8:31 PM marsaev @.***> wrote:

@AnjaliSandip https://github.com/AnjaliSandip It seems error indicates that pinned memory pool cannot be allocated:

Caught amgx exception: Cannot allocate pinned memory at: /home/anjali.sandip/ISSM/ISSM/externalpackages/petsc/src/arch-linux-c-opt/externalpackages/git.amgx/src/global_thread_handle.cu:374

It's size is currently fixed to 100 MB: https://github.com/NVIDIA/AMGX/blob/v2.4.0/src/global_thread_handle.cu#L51 regardless of the input data ( and this allocation happens during resources creation at which point we don't know problem size)

Is your process allowed to allocate page-locked memory? (i.e. for docker containers you have to provide respective ulimit flag, i.e.: --ulimit memlock=-1)

— Reply to this email directly, view it on GitHub https://github.com/NVIDIA/AMGX/issues/313#issuecomment-2168979049, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOQK52GYU3ROKT6RC5HAU43ZHODOFAVCNFSM6AAAAABJKW5RNGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNRYHE3TSMBUHE . You are receiving this because you were mentioned.Message ID: @.***>

Jun 17 '24 00:06 AnjaliSandip

@AnjaliSandip sorry for the delayed reply. I'm not familiar with PETSc internals, but unless PETSc environment somehow hooks cudaMallocHost, it's settings shouldn't affect AMGX, since AMGX using a call directly to CUDA Runtime: https://github.com/NVIDIA/AMGX/blob/v2.4.0/src/global_thread_handle.cu#L378

You can try running an example that tries to allocate same amount of pinned memory to see if it's environment issue, something like this: https://godbolt.org/z/7ab86qc34

If there is no obvious/easy fix to page locked memory, I would suggest opening a ticket for PETSc (https://gitlab.com/petsc/petsc/-/issues), as they are more knowledgeable about PETSc details that might be important here. You can link this issue for the reference and i can follow up in the case there would be any further questions to AMGX.

Jul 02 '24 18:07 marsaev