libvfio-user
libvfio-user copied to clipboard
Avoid bit fields in protocol message definitions
The vfio_user_header.flags
field currently uses bit fields to specify various sub-fields: https://github.com/nutanix/libvfio-user/blob/149aa845b11bd13fca41dcf65b51283f83ac5520/include/vfio-user.h#L80-L93
Bit field layout is up to the compiler, and thus may differ depending on machine byte order, optimization settings, etc.
I observed this as problematic when testing on a big-endian host. All VFIO-user protocol fields are in big-endian byte order then (as expected per #615 IIUC), but the flags subcomponents weren't compiled in reverse order, thus ending up in the wrong bit positions.
it's host-endian, not big-endian: that is, we don't do, or attempt to do, any handling of different endianness at all. We certainly wouldn't be against supporting this, but not something we're planning to work on ourselves.
To clarify my environment: I ran a qemu vfio-user client against a qemu vfio-user server on a big-endian host. So, both client and server are in big-endian byte order and hence expected to interoperate.
However, I found that server and client disagree on the bit field layout in the header flags field. The qemu client would treat flags as a single uint32_t value in host byte order and so the flags field of a reply message would be 0x00 0x00 0x00 0x01
in memory (which I consider correct per the spec, considering that the flags field is supposed to be in host byte order, which is big endian). However, libvfio-user's bit field places the type bits at other end of the uint32_t, i.e. a flags field of a reply message is encoded as 0x10 0x00 0x00 0x00
. Unsurprisingly, I didn't even get past version negotiation ;-)
Then I researched what the net has to say about bit fields and byte order, and found that the standards leave this implementation defined. In practice, compilers are forced to implement whatever widespread conventions there are (e.g. Linux does https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/uapi/linux/ip.h?h=03702d4d29be4e2510ec80b248dbbde4e57030d9#n88 for example). At the end of the day, it's IMO prudent to accept that bit field encoding is a mine field and rather rely on shifting and masking to pack/unpack the flags field.
I see, because the client doesn't use bitfields, we get bitten by the different on big endian systems. Makes sense.
@mnissler-rivos are you planning to submit a patch?
It's not a priority for me given that none of my stuff is running on big-endian host systems (I found this after checking on big-ending when checking qemu behavior in context of a code review there). If you're all OK with replacing the bitfield with #define
s, I can put together a quick MR and test it later this week.
I'm not sure there's any other way around this other than using #define
s, so I'd take such a PR, @jlevon ?
yup for sure thanks