Liyue Zhang
Liyue Zhang
@lifupan Thank you for your response. My objective is to utilize cgroupv2 to regulate the resource usage of subprocesses within my code, **inside Kata's containers**. Essentially, I intend to employ...
> Could you give more info about the result of "cannot find any cgroup v2 files" ? what's the concrete info about it ? >> Normally,on an OS that has...
@Apokleos Here are some recent updates: The command "ctr" from `containerd` doesn't mount the cgroup filesystem by default. So, running `mount -t cgroup2 none /sys/fs/cgroup/` allows me to view the...
I regret the oversight regarding the upstream update of GDRCopy. I will now specify the version of GDRCopy for clarity. For instance, when using Ubuntu 22.04 and CUDA 12.3 as...
> Hi, the instruction asks us to install packages in the docker without rebuilding modules. However, in my case, the host is a centos machine while the docker is a...
The error logs indicate an NVSHMEM-related issue during the initialization of QPs (Queue Pairs). It is recommended to contact NVIDIA technical support to provide necessary assistance for RoCE devices, particularly...
Could you please confirm whether the Ada Lovelace architecture GPUs support GPU Direct RDMA (GDR) and GPU Direct Async (IBGDA)? If so, DeepEP should also be able to run on...
1. Based on your logs, it appears that the system is unable to retrieve information from other ranks during bootstrap. We recommend checking your network connectivity settings, including: - Proper...
@Baibaifan The message `neither nv_peer_mem nor nvidia_peermem detected` indicates that your system environment does not currently support GPU Direct RDMA. To resolve this, please try loading the GDR kernel module...