Incorrect API result "virtual_disk_count" for some VM
NetBox Edition
NetBox Community
NetBox Version
v4.4.6
Python Version
3.11
Steps to Reproduce
Encountered on v4.3.4. Reproducible after upgrading to v4.4.6. Not all (new) VMs affected for some reason.
- Create VM
- Create virtual disk for VM
Expected Behavior
- VM shows number of disks on "Virtual Disks" tab
- VM return correct "virtual_disk_count" number from API request
Observed Behavior
No disks on UI and API counters.
Thanks for the report, @stavr666. I'm not able to reproduce your STR exactly, but I do see something very similar.
In the representation of my new VM from the API (from /api/virtualization/virtual-machines/ list endpoint and from the /api/virtualization/virtual-machines/541/ detail endpoint), the nested role object show zeroes for device_count and virtualmachine_count (which is inaccurate on its face, since it's nested in a VM with that role!).
However, when I view the detail of that device role (/api/dcim/device-roles/7/) the counts match what is displayed in the web UI.
However, when I view the detail of that device role (
/api/dcim/device-roles/7/) the counts match what is displayed in the web UI.
Yes, similarity of UI and API wrong results also seems strange to me. Encountered this rare UI bug before (never reported since it was not critical). But now (several days before trying to fix it by upgrading to v4.6.4) it's causes our automation to broke in pipeline "compare by disk count". We can rewrite our scripts, but it'll become slow again.
Can I somehow collect any technical details, that can help diagnose source of error? SQL request or something?
Can I somehow collect any technical details, that can help diagnose source of error? SQL request ore something?
Python stack traces (in the case of unhandled exceptions), SQL queries, versions, and things along those lines are the most useful for really isolating where the problem is originating from.
Although, in this case, I suspect that it has something to do with how our CounterCacheField and how it's being handled by API serializers.
it has something to do with how our
CounterCacheField
Any way to bump it's refresh manually?
I don't actually believe it's an issue with the actual count being wrong, so much as it is an issue with the serializer not reading the value correctly and defaulting to zero. But, that's just speculation. I have not had any time to dig in to this one at all.
Seems like this and #19976 are likely related.
It's not that critical for now, so I'll keep track on mentioned issue.
I might be wrong here, but to me this looks like two slightly different problems.
The API nested object counts (e.g. the device_count on role) seem to be coming from queryset annotations. Those I can reproduce pretty easily, including on the public demo.
By contrast, the virtual_disk_count field is a cached integer field (a CounterCacheField) on the model. While working on #19523 I ran into similar problems with cached counters and opened #20697 to track a CounterCacheField double‑counting bug. In that investigation I managed to push some counters into negative values (for example -2 devices) when the initial value was 0 and a related Device was deleted. With the CounterCacheField mechanism in place, every creation bumps the counter by +1 and every deletion by -1, so if the counter ever gets out of sync, it can drift into odd values.
There is a management command that recalculates all CounterCacheField values:
python3 netbox/manage.py calculate_cached_counts
@stavr666 Could you try running this on your instance and then repeat the steps you used to trigger the issue? It would be helpful to know whether the problem persists after the counters have been rebuilt, or if it only affected stale values from before.
If I’m misunderstanding the root cause here, please feel free to correct me. Just sharing what I’ve seen while working with the counter fields recently. 🙌
There is a management command that recalculates all
CounterCacheFieldvalues:python3 netbox/manage.py calculate_cached_counts
It helped. Both UI and API shows correct values now:
Thanks for confirming, @stavr666 ! Glad to hear the values look correct now! 🙌
If you have a moment, could you try to repeat the steps that originally triggered the mismatch (both with existing objects and with newly created ones) and see whether you can still reproduce the issue?
That would help a lot to confirm whether this was just a case of stale cached counters or if there’s still an underlying bug we should keep digging into. No pressure if you don’t have time right away, of course 🙂
@pheus New VMs have same caching issues:
@stavr666 I'm not able to reproduce this on NetBox v4.4.8. If you're still encountering this issue after upgrading, could you please share updated reproduction steps?