VM with GPU passthrough can't start while memory request < limit
What happened: The vm with GPU passthrough can't start while memory request < limit, the virtualMachine definition spec like this:
spec:
runStrategy: RerunOnFailure
template:
labels:
kubevirt.io/domain: centos-gpu
spec:
domain:
devices:
disks:
- disk:
bus: virtio
name: containerdisk
gpus:
- deviceName: nvidia.com/TU102_GEFORCE_RTX_2080_TI
name: gpu1
machine:
type: q35
resources:
limits:
memory: 6Gi
requests:
memory: 4Gi
volumes:
- containerDisk:
image: xxxx/centos7:v1
imagePullPolicy: IfNotPresent
name: containerdisk
the logs of virt-launcher pod just like:
compute {"component":"virt-launcher","level":"error","msg":"At least one cgroup controller is required: No such device or address","pos":"virCgroupDetectControllers:455","subcomponent":"libvirt","thread":"36","timestamp":"2022-09-14T03:29:32.672000Z"}
compute {"component":"virt-launcher","level":"error","msg":"Unable to read from monitor: Connection reset by peer","pos":"qemuMonitorIORead:494","subcomponent":"libvirt","thread":"216","timestamp":"2022-09-14T03:29:33.804000Z"}
compute {"component":"virt-launcher","level":"error","msg":"internal error: qemu unexpectedly closed the monitor: 2022-09-14T03:29:33.740944Z qemu-kvm: -device vfio-pci,host=0000:88:00.0,id=ua-gpu-gpu1,bus=pci.5,addr=0x0: VFIO_MAP_DMA failed: Cannot allocate memory","pos":"qemuProcessReportLogError:2046","subcomponent":"libvirt","thread":"216","timestamp":"2022-09-14T03:29:33.805000Z"} │
compute parsing time "2022-09-14T03:29:33.773415Z qemu-kvm" as "2006-01-02 15:04:05.999-0700": cannot parse "T03:29:33.773415Z qemu-kvm" as " "
compute {"component":"virt-launcher","level":"info","msg":"Reaped pid 215 with status 256","pos":"virt-launcher.go:550","timestamp":"2022-09-14T03:29:33.821028Z"}
compute {"component":"virt-launcher","kind":"","level":"error","msg":"Failed to start VirtualMachineInstance with flags 0.","name":"centos-gpu","namespace":"default","pos":"manager.go:875","reason":"virError(Code=1, Domain=10, Message='internal error: qemu unexpectedly closed the monitor: 2022-09-14T03:29:33.740944Z qemu-kvm: -device vfio-pci,host=0000:88:00.0,id=ua-gpu-gpu1,bus=pci.5,addr=0x0: VFIO_MAP_DMA failed: Cannot allocate memory\n2022-09-14T03:29:33.773415Z qemu-kvm: -device vfio-pci,host=0000:88:00.0,id=ua-gpu-gpu1,bus=pci.5,addr=0x0: vfio 0000:88:00.0: failed to setup container for group 124: memory listener initialization failed: Region pc.ram: vfio_dma_map(0x5642f2946b30, 0x100000000, 0x6e700000, 0x7f7d81800000) = -12 (Cannot allocate memory)')","timestamp":"2022-09-14T03:29:34.008228Z","uid":"10bbe548-8e28-466b-8f8d-46f99d6a4a65"}
compute {"component":"virt-launcher","kind":"","level":"error","msg":"Failed to sync vmi","name":"centos-gpu","namespace":"default","pos":"server.go:184","reason":"virError(Code=1, Domain=10, Message='internal error: qemu unexpectedly closed the monitor: 2022-09-14T03:29:33.740944Z qemu-kvm: -device vfio-pci,host=0000:88:00.0,id=ua-gpu-gpu1,bus=pci.5,addr=0x0: VFIO_MAP_DMA failed: Cannot allocate memory\n2022-09-14T03:29:33.773415Z qemu-kvm: -device vfio-pci,host=0000:88:00.0,id=ua-gpu-gpu1,bus=pci.5,addr=0x0: vfio 0000:88:00.0: failed to setup container for group 124: memory listener initialization failed: Region pc.ram: vfio_dma_map(0x5642f2946b30, 0x100000000, 0x6e700000, 0x7f7d81800000) = -12 (Cannot allocate memory)')","timestamp":"2022-09-14T03:29:34.008342Z","uid":"10bbe548-8e28-466b-8f8d-4 6f99d6a4a65"}
And the kernel log has this:
# dmesg
...
[415942.683580] vfio_pin_pages_remote: RLIMIT_MEMLOCK (5555355648) exceeded
What you expected to happen: I think the vm with GPU passthrough can run normally in spite of the memory request and limit, just like the vm without GPU passthrough.
How to reproduce it (as minimally and precisely as possible): use the vm definition in the above.
Additional context: VM with GPU passthrough can run normally when the memory request == limit.
Environment:
- KubeVirt version (use
virtctl version): v0.50.0 - Kubernetes version (use
kubectl version): v1.23.4 - VM or VMI specifications: in the above
- Cloud provider or hardware configuration: N/A
- OS (e.g. from /etc/os-release): debian 11
- Kernel (e.g.
uname -a): 5.10.0-16-amd64 - Install tools: N/A
- Others: N/A
@booxter Hello, Can you look into the problem? I have noticed that the vm attached vfio will adjust the MEMLOCK, but I don't know how to fix this.
@caohuilong The way I have been getting around this on kubevirt 0.54 has been adding SYS_RESOURCE to the virt-launcher Pod's securityContext: https://man7.org/linux/man-pages/man7/capabilities.7.html#:~:text=on%20other%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20devices.-,CAP_SYS_RESOURCE,-*%20Use%20reserved%20space
It looks like this is related to https://github.com/kubevirt/kubevirt/pull/8367 which was merged last month.
To clarify if the limits are the root cause of this issue, can you please re-test after only removing the limit parameter from this config?
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
/lifecycle stale
I have the same problem in v0.55.0 of kubevirt.
# dmesg
[ +0.001779] vfio_pin_pages_remote: RLIMIT_MEMLOCK (65536) exceeded
# ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 1542940
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 65535
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 1542940
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close.
/lifecycle rotten
Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.
/close
@kubevirt-bot: Closing this issue.
In response to this:
Rotten issues close after 30d of inactivity. Reopen the issue with
/reopen. Mark the issue as fresh with/remove-lifecycle rotten./close
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.