libvma icon indicating copy to clipboard operation
libvma copied to clipboard

VMA defaults to huge pages on Redhat 7.1 which results in the failure of memory registration

Open clameter opened this issue 9 years ago • 2 comments

On RH7.1 VMA works with VMA_MEM_ALLOC_TYPE=1 when configured with a large number of hugepages. But when we do not set this value we get:

VMA WARNING: ib_ctx_collection89:mem_reg_on_all_devices() Failure in mem_reg: addr=0x2aaaab200000, length=372800063, mr_pos=0, mr_array[mr_pos]=0, dev=0x74a540, ibv_dev=mlx4_0 VMA WARNING: bpool[0x754840]:269:register_memory() Failed registering memory, This might happen due to low MTT entries. Please refer to README.txt for more info

settings:

[mlx4_core]$ cd parameters [parameters]$ ls -l total 0 -rw-r--r-- 1 root root 4096 Aug 6 13:00 block_loopback -rw-r--r-- 1 root root 4096 Aug 6 13:00 debug_level -r--r--r-- 1 root root 4096 Aug 6 13:00 enable_64b_cqe_eqe -r--r--r-- 1 root root 4096 Aug 6 13:00 enable_qos -rw-r--r-- 1 root root 4096 Aug 6 13:00 internal_err_reset -r--r--r-- 1 root root 4096 Aug 6 13:00 log_mtts_per_seg -r--r--r-- 1 root root 4096 Aug 6 13:00 log_num_mac -r--r--r-- 1 root root 4096 Aug 6 10:38 log_num_mgm_entry_size -r--r--r-- 1 root root 4096 Aug 6 13:00 log_num_vlan -r--r--r-- 1 root root 4096 Aug 6 13:00 msi_x -r--r--r-- 1 root root 4096 Aug 6 13:00 num_vfs -r--r--r-- 1 root root 4096 Aug 6 13:00 port_type_array -r--r--r-- 1 root root 4096 Aug 6 13:00 probe_vf -r--r--r-- 1 root root 4096 Aug 6 13:00 use_prio [parameters]$ cat * 1 0 Y N 1 7 7 -1 0 1

,0,0

,N

clameter avatar Aug 06 '15 18:08 clameter

The simple answer is that probably the default MTT configuration of the driver is not enough for the amount of memory (based on 4 K pages) that VMA requires among all processes loaded with VMA . With hugepages (2 MB), the driver requires 1/500 MTT entries for the same amount of memory compared to the 4K pages, so the default MTT table size is large enough.

Possible Solutions:

  1. Use HugePages: Note that with 4K pages you're expected to get worse performance then when running with HugePages.
  2. Reduce amount of memory VMA requires: Tune the following VMA parameters: VMA_TX_BUFS, VMA_RX_BUFS, Try some reduced value which workes well with your application.
  3. Check the MTT configuration for 4K page usage: Increase the maximum number of memory translation table segments per HCA. This is usually required when you need more than 64GB for VMA enabled processes, you can increase the maximum amount of available RDMA memory by increasing the value of log_mtts_per_seg. # echo "options mlx4_core log_num_mtt=24 log_mtts_per_seg=0” > /etc/modprobe.d/mofed.conf a. Reboot the server or restart the openibd. To restart the openibd: # sudo service openibd restart b. Verify the changes took effect: # cat /sys/module/mlx4_core/parameters/log_num_mtt # cat /sys/module/mlx4_core/parameters/log_mtts_per_seg
  4. With MLNX_OFED we use config pages which allow 4k page allocations, but with much fewer MTT entries to preserve high NIC performance and low MTT usage (see IBV_EXP_ACCESS_ALLOCATE_MR). We'll check when can we push this code upstream as well.

rosenbaumalex avatar Aug 11 '15 06:08 rosenbaumalex

We have the problem when hugepages are enabled. We do not have the problem with 4k pages.

Solution #1 does not work therefore.

Is the # of buffers in page size units? That could have caused the issue.

clameter avatar Aug 11 '15 11:08 clameter