libvma icon indicating copy to clipboard operation
libvma copied to clipboard

libvma uses huge amount of memory (~4x8G) with max RLIMIT_NOFILE

Open champtar opened this issue 1 year ago • 2 comments

Subject

libvma uses huge amount of memory (~4x8G) with max RLIMIT_NOFILE Going from EL8 to EL9, default Max open files limit goes from 1048576 to 1073741816, this is true on any host using systemd 240+ if not overridden (https://github.com/systemd/systemd/commit/a8b627aaed409a15260c25988970c795bf963812 / https://access.redhat.com/solutions/1479623) (might not be true for user session but true for container)

In libvma there is this code: https://github.com/Mellanox/libvma/blob/cef07e0d0fb6173ed9c3f6f91ed7b48245ca4e5a/src/vma/sock/fd_collection.cpp#L63-L89 when running with strace it gives:

prlimit64(0, RLIMIT_NOFILE, NULL, {rlim_cur=1073741816, rlim_max=1073741816}) = 0
mmap(NULL, 8589934592, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f130c000000
mmap(NULL, 8589934592, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f110c000000
mmap(NULL, 8589934592, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f0f0c000000
mmap(NULL, 8589934592, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f0d0c000000

Issue type

  • [X] Bug report

Configuration:

  • Product version: libvma-9.7.2-1.x86_64
  • OS: Alma 9

Actual behavior:

libvma allocate 32G of RAM for bookeeping

Expected behavior:

Either:

  • warn when RLIMIT_NOFILE is too high
  • error out when RLIMIT_NOFILE is too high
  • set RLIMIT_NOFILE to a lower value if it's too high
  • rewrite the function to not preallocate the memory

Steps to reproduce:

ulimit -n 1073741816
# run libvma

champtar avatar Jan 05 '24 08:01 champtar

(nvidia support case number 00656662)

champtar avatar Jan 11 '24 12:01 champtar

@AlexanderGrissik please assist

igor-ivanov avatar Jan 15 '24 09:01 igor-ivanov