rongfu.leng
rongfu.leng
We also have this demand at present. I have a idea. Can you help me take a look? @cdesiniotis Because we can currently handle vfio for one device, we can...
Discuss with @Iceber and think that the shim process exits too quickly, causing the shim not to receive the OOM event. My idea is that the current watch oom `s.ep.Add`...
/assign
> > > Discuss with @Iceber and think that the shim process exits too quickly, causing the shim not to receive the OOM event. > > > > > >...
```go type TaskOOM struct { state protoimpl.MessageState sizeCache protoimpl.SizeCache unknownFields protoimpl.UnknownFields ContainerID string `protobuf:"bytes,1,opt,name=container_id,json=containerId,proto3" json:"container_id,omitempty"` } ``` find `ContainerID` value is not `ContainerID`, is `SandboxID`. 1. faild test log: https://storage.googleapis.com/kubernetes-jenkins/logs/ci-containerd-node-e2e-1-7/1697683704735862784/artifacts/tmp-node-e2e-b9a730eb-ubuntu-gke-2204-1-24-v20220623-system.log...
> Why does it only support V0? Can you update the PR description? According to the error, it is because flashattention is not supported.
``` DEBUG 04-11 23:28:36 [__init__.py:28] No plugins for group vllm.platform_plugins found. DEBUG 04-11 23:28:36 [__init__.py:34] Checking if TPU platform is available. DEBUG 04-11 23:28:36 [__init__.py:44] TPU platform is not available...
I look your version is 0.8.5, i will use this version to test again.
😠I can still reproduce this problem on version 0.8.4, and I don't know the reason. Can you help verify it? @DarkLight1337   `python3 collect_env.py` resoult. ``` PyTorch version:...
Can each dra-driver implement a webhook to create a ResourceClaimTemplate after creating a pod and modify the application method of resources in the pod?