Wajih Yassine comments

Results 48 comments of


                                            Wajih Yassine

Virtio deadlock causing failure to attach persistent disks to VM in high loads

To add to the strangeness, in some cases (not all). Post processing fails to detach a disk due to a missing parameter `deviceName`. ``` INFO:Detaching disk test-disk-20gb-79 from instance gke-turbinia-main-default-pool-38275869-mbm7...

Virtio deadlock causing failure to attach persistent disks to VM in high loads

Okay so re running another load test scaled up to 10 nodes and 3 initial pods up to 500 I still see the resource in use issue a few minutes...

Virtio deadlock causing failure to attach persistent disks to VM in high loads

The source of the issue seems to be related to GCP instance.attachDisk() and unreliably attaching disks to the VM. I don't see any error however from libcloudforensics (not sure if...

Virtio deadlock causing failure to attach persistent disks to VM in high loads

So libcloudforensics seems to just call the API but does not return the response back: https://github.com/google/cloud-forensics-utils/blob/a71b13c3a7108e4d37879007617203c5e34170ff/libcloudforensics/providers/gcp/internal/compute.py#L1423 I confirmed this via trying to log `instances.AttachDisk` in Turbinia and evaluates to `None`....

Virtio deadlock causing failure to attach persistent disks to VM in high loads

This issue is weird... I ran two load tests yesterday for 100 disks with 10 Nodes/200 Pods scaled before and both did not error out so was thinking the issue...

Virtio deadlock causing failure to attach persistent disks to VM in high loads

I reviewed the affected VMs and attempted to try and attach a disk manually to the VM via `gcloud attach-disk` and do not see the device show up in `/dev/disk/by-id`....

Virtio deadlock causing failure to attach persistent disks to VM in high loads

Hmm looking at the serial output from GCP console of the affected VMs, see this pattern: ``` [ 661.206645] INFO: task kworker/6:9:5654 blocked for more than 327 seconds. [ 661.213722]...

Virtio deadlock causing failure to attach persistent disks to VM in high loads

Looking at the rest of the logs around the time of Kernel freeze, I see multiple `attachDisk` events for different disks being attached to the affected VM all within the...

Virtio deadlock causing failure to attach persistent disks to VM in high loads

Looks like the sleep helped but still saw the issue come up during a load test run, although seems to be a lot less frequent than before. Doing some more...

Virtio deadlock causing failure to attach persistent disks to VM in high loads

The way to detect an issue happening per my chat is if you review the KCP logs, can see a similar error as such: ``` 2023-01-04T00:05:40.017812Z [resource.labels.instanceId:] attacherDetacher.DetachVolume started for...