Virtio-Balloon stats crash with new fields
Description
The virtio balloon device offers the stats feature which the guest drvier will report various memory stats about the guest. These are sent with a tag and value, the spec of which is defined in the virtio spec.
It was discovered in https://github.com/firecracker-microvm/firecracker/pull/5516 that if the guest driver reports new fields this would cause firecrackers balloon stats to crash. Instead of crashing on unexpected fields we should log a warning and resume reporting the stats. We use this method here to decode the tags, and by default we return an Error.
We should also have an integration test so we can detect if the fields change.
My early temporary fix is just replace return Err(BalloonError::MalformedPayload) to warn! while firecracker does not know the new tag. So I can find this warn in firecracker.log and firecracker stat works well.
Also we can check the value of #define VIRTIO_BALLOON_S_NR 16 in latest include/uapi/linux/virtio_balloon.h to known if the fields change.
Using the warn! and checking this in the log as part of an integration test was exactly what I was thinking as well.
Hi @JackThomson2 and @laushunyu , I'm currently a student at UT Austin taking a Virtualization class. Is this issue still open and requiring a fix or is someone already working on it?
If it is open, would it be possible for me to take this issue to work on? Thank you!
Hi @22aronl . We are certainly happy for you to take it in case @laushunyu isn't going to work on it. Thanks!
@JackThomson2 I currently use warn! and added checks in test_stats and test_stats_update of the test_balloon.py tests in integration tests to check if that log has gone off. Is there other changes you would prefer for this issue? Thanks!
Hey @22aronl thanks for taking a look. That sounds good and that should be enough, when you're ready feel free to open a PR.
Hey @JackThomson2 , I have opened a PR, not sure if everything is set up properly though. Please let me know! Thanks!
Resolved by PR: https://github.com/firecracker-microvm/firecracker/pull/5572