AKS
AKS copied to clipboard
[BUG] AKS Confidential Computing bugs when creating pod using my images
Describe the bug I want to use AKS confidential computing for my tasks, and I found that when I created pods using my images, the pod failed to create, and if I replace the image in the k8s yaml file, this can be launched. I collected the kata and containerd debug information here, you can use this to debug: https://github.com/blossomin/akslog
inside this logs:
log file name suffix: myworker means using my own worker image, and mcr-pytorch means using("mcr.microsoft.com/azurelinux/base/pytorch:2.2.2-1-azl3.0.20240824-amd64")
by simply comparing the kata logs: I found one statement only exists in the worker_error_kata_myworker.log:
cloud-hypervisor: 11.942990s:
To Reproduce Steps to reproduce the behavior: Currently, my image is internal, so hard to reproduce,
Expected behavior this pod can be launched without any problems using any images
@agowdamsft would you be able to assist?
One follow up: my worker image has 40 layers, while the "mcr.microsoft.com/azurelinux/base/pytorch:2.2.2-1-azl3.0.20240824-amd64" has about 13 layers.
I guess this issue is:
- one layer is mapped to one virtio-pci device,
- only 31 PCI slots per (confdiential)-VM
this causes the resource contention/shortage.
This issue has been automatically marked as stale because it has not had any activity for 30 days. It will be closed if no further activity occurs within 7 days of this comment. @angarg05
This issue will now be closed because it hasn't had any activity for 7 days after stale. blossomin feel free to comment again on the next 7 days to reopen or open a new issue after that time if you still have a question/issue or suggestion.