vhost-device icon indicating copy to clipboard operation
vhost-device copied to clipboard

vsock: Issues in sibling VM communication

Open techiepriyansh opened this issue 2 years ago • 1 comments

These issues were discovered while trying to test the current implementation of sibling VM communication in vhost-user-vsock. The testing was done with iperf-vsock and nc-vsock, both patched to set .svm_flags = VMADDR_FLAG_TO_HOST.

Issues

Deadlock

If you try to test the sibling communication by running iperf-vsock or transferring big files with nc-vsock, the vhost-user-vsock process hangs and becomes completely irresponsive. After a bit of debugging, I discovered that there is deadlock.

The deadlock occurs when two sibling VMs simultaneously try to send each other packets. The VhostUserVsockThreads corresponding to both the VMs hold their own locks while executing thread_backend.send_pkt and then try to lock each other to access their counterpart's raw_pkts_queue. This ultimately results in a deadlock.

In particular, this line of code unleashes the deadlock.

The deadlock can be resolved by separating the mutex over raw_pkts_queue from the mutex over VhostUserVsockThread.

Raw packets queue not being processed completely

Even after resolving the deadlock, the vhost-user-vsock process still hangs while testing, though not completely irresponsive this time. It turns out that sometimes the raw packets pending on the raw_pkts_queue are never processed, resulting in the hang.

This happens because currently, the raw_pkts_queue is processed only when a SIBLING_VM_EVENT is received. But it may happen that the raw_pkts_queue could not be processed completely due to insufficient space in the RX virtqueue at that time.

This can be resolved by trying to process raw packets on other events too similar to what happens in the RX of standard packets.

Current status

While fixing the above two issues seems to make nc-vsock run flawlessly, testing with iperf-vsock still results in the vhost-user-vsock process hanging. There might be a notification problem and could be related to the EVENT_IDX feature.

techiepriyansh avatar Jul 05 '23 18:07 techiepriyansh

While #385 resolves the deadlock and the problem with raw packets queue not being processed completely, iperf-vsock still doesn't work. Following could be the reasons for that:

  • There might be a notification problem inside the vhost-user-vsock application
  • It could be due to the way iperf works internally
  • It might have something to do with vsock credit updates

techiepriyansh avatar Jul 14 '23 08:07 techiepriyansh