libxlio icon indicating copy to clipboard operation
libxlio copied to clipboard

xlio: fix netlink recv hang in VM Environment

Open susobhandey opened this issue 3 months ago • 4 comments

In VM environments, netlink dump responses may arrive with empty data or only NLMSG_DONE. XLIO's netlink_socket_mgr::recv_info() did not check nlmsg_type for NLMSG_DONE and blocked in recv() indefinitely. This patch updates recv_info() to iterate over nlmsghdr chain using NLMSG_NEXT and exit gracefully on NLMSG_DONE or NLMSG_ERROR.

Description

In VM environments, netlink dump responses may arrive with empty data or only NLMSG_DONE. XLIO's netlink_socket_mgr::recv_info() did not check nlmsg_type for NLMSG_DONE and blocked in recv() indefinitely. This patch updates recv_info() to iterate over nlmsghdr chain using NLMSG_NEXT and exit gracefully on NLMSG_DONE or NLMSG_ERROR.

What

_Subject: Netlink: recv_info hang in vm environment

Why ?

It should go over the entire chain of hdr and look for MSG_DONE flag

How ?

This patch updates recv_info() to iterate over nlmsghdr chain using NLMSG_NEXT and exit gracefully on NLMSG_DONE or NLMSG_ERROR

Change type

What kind of change does this PR introduce?

  • [x] Bugfix

susobhandey avatar Oct 01 '25 15:10 susobhandey

Can one of the admins verify this patch?

svc-nbu-swx-media avatar Oct 01 '25 15:10 svc-nbu-swx-media

Without these changes route information are not getting populated.

susobhandey avatar Oct 01 '25 16:10 susobhandey

@AlexanderGrissik , please review.

galnoam avatar Oct 05 '25 06:10 galnoam

@tomerdbz your PR should fix this issue as well?

galnoam avatar Oct 06 '25 05:10 galnoam

@tomerdbz your PR should fix this issue as well?

@tomerdbz ?

galnoam avatar Dec 09 '25 13:12 galnoam

@tomerdbz?

galnoam avatar Dec 12 '25 07:12 galnoam

@galnoam yup :)

tomerdbz avatar Dec 14 '25 07:12 tomerdbz

@tomerdbz what is your PR can you give it here? As I gave seen the same issue in latest XLIO. Is it merged?

susobhandey avatar Dec 14 '25 08:12 susobhandey

@susobhandey https://github.com/Mellanox/libxlio/pull/278 - not merged yet

tomerdbz avatar Dec 14 '25 08:12 tomerdbz

Tomer PR was merged and should fix the issue, thanks.

galnoam avatar Dec 23 '25 08:12 galnoam