firecracker icon indicating copy to clipboard operation
firecracker copied to clipboard

Virtio-Balloon stats crash with new fields

Open JackThomson2 opened this issue 1 month ago • 7 comments

Description

The virtio balloon device offers the stats feature which the guest drvier will report various memory stats about the guest. These are sent with a tag and value, the spec of which is defined in the virtio spec.

It was discovered in https://github.com/firecracker-microvm/firecracker/pull/5516 that if the guest driver reports new fields this would cause firecrackers balloon stats to crash. Instead of crashing on unexpected fields we should log a warning and resume reporting the stats. We use this method here to decode the tags, and by default we return an Error.

We should also have an integration test so we can detect if the fields change.

JackThomson2 avatar Nov 27 '25 16:11 JackThomson2

My early temporary fix is just replace return Err(BalloonError::MalformedPayload) to warn! while firecracker does not know the new tag. So I can find this warn in firecracker.log and firecracker stat works well.

Also we can check the value of #define VIRTIO_BALLOON_S_NR 16 in latest include/uapi/linux/virtio_balloon.h to known if the fields change.

laushunyu avatar Nov 28 '25 08:11 laushunyu

Using the warn! and checking this in the log as part of an integration test was exactly what I was thinking as well.

JackThomson2 avatar Nov 28 '25 15:11 JackThomson2

Hi @JackThomson2 and @laushunyu , I'm currently a student at UT Austin taking a Virtualization class. Is this issue still open and requiring a fix or is someone already working on it?

If it is open, would it be possible for me to take this issue to work on? Thank you!

22aronl avatar Dec 02 '25 23:12 22aronl

Hi @22aronl . We are certainly happy for you to take it in case @laushunyu isn't going to work on it. Thanks!

kalyazin avatar Dec 03 '25 15:12 kalyazin

@JackThomson2 I currently use warn! and added checks in test_stats and test_stats_update of the test_balloon.py tests in integration tests to check if that log has gone off. Is there other changes you would prefer for this issue? Thanks!

22aronl avatar Dec 07 '25 03:12 22aronl

Hey @22aronl thanks for taking a look. That sounds good and that should be enough, when you're ready feel free to open a PR.

JackThomson2 avatar Dec 08 '25 09:12 JackThomson2

Hey @JackThomson2 , I have opened a PR, not sure if everything is set up properly though. Please let me know! Thanks!

22aronl avatar Dec 09 '25 01:12 22aronl

Resolved by PR: https://github.com/firecracker-microvm/firecracker/pull/5572

JackThomson2 avatar Dec 15 '25 12:12 JackThomson2