lxd
lxd copied to clipboard
'/1.0/instances?recursion=2' Endpoint has missing information.
Required information
- Distribution:
- Distribution version:
- The output of "snap list --all lxd core20 core22 core24 snapd":
- The output of "lxc info" or if that fails:
- Kernel version: 6.8.0-45-generic
- LXC version: 5.21.2 LTS
- LXD version: 5.21.2 LTS
- Storage backend in use:
Issue description
This issue documents the findings of an investigation into the limitations and issues with using certain API endpoints for fetching instance data, specifically disk and memory information. The investigation focused on two main endpoints: the /1.0/metrics
endpoint and the /1.0/instances?recursion=2
endpoint. Both endpoints exhibit shortcomings in their ability to provide the required information, especially when instances are in a "stopped" state.
Findings
1. Issues with the /1.0/metrics
Endpoint
The metrics endpoint is currently used on the "Detail Instances" page to calculate various instance-related metrics.
- Problem 1: It does not provide data when an instance is stopped. Specifically, certain metrics (such as disk and memory totals) are unavailable when the instance is not running. I have found that this is because when an instance is not running, certain metrics such as "lxd_memory_MemFree_bytes" and "lxd_memory_MemTotal_bytes" are not available in the api response.
- Problem 2: The metrics endpoint returns a large amount of data, much of which is filtered out after retrieval. This can impose a significant load on larger systems, making it a suboptimal choice for regular use. That being said, in LXD-UI we use Lazy loading to combat this, but it may still not be a sustainable solution.
Given these limitations, it is not feasible to rely on the metrics endpoint for obtaining instance data, especially when aiming for a lightweight solution that works regardless of instance status.
2. Issues with the /1.0/instances?recursion=2
Endpoint
The /1.0/instances?recursion=2
endpoint is designed to fetch comprehensive data on all instances. It should ideally return all necessary details, including disk and memory metrics, irrespective of the instance state.
- Problem 1: When an instance is stopped, the total field for disk and memory metrics that is returned from the API is set to 0, meaning the data is not accurately reported. Please see the responses below for context.
- Problem 2: When the instance is running, the disk attribute does not display the total correctly (shows 0, this is broken), which impacts the reliability of this endpoint for fetching disk usage metrics.
Note, when this endpoint is is used to provide memory usage totals, it is understandable that when an instance is stopped it should not return data (as memory would not be in use).
/1.0/instances?recursion=2
on a running instance
{
"status": "Running",
"status_code": 103,
"disk": {
"root": {
"usage": 1183744,
"total": 0
}
},
"memory": {
"usage": 1310720,
"usage_peak": 0,
"total": 7823340000,
"swap_usage": 331776,
"swap_usage_peak": 0
},
...
}
(Note how despite running, the 'total' data returned from the disk is 0?)
/1.0/instances?recursion=2
on a Stopped Instance
{
"status": "Stopped",
"status_code": 102,
"disk": {
"root": {
"usage": 1182720,
"total": 0
}
},
"memory": {
"usage": 0,
"usage_peak": 0,
"total": 0,
"swap_usage": 0,
"swap_usage_peak": 0
},
...
}
Note, disk data should still be available here, perhaps also memory total?
Steps to reproduce
- Create an instance in LXD-UI
- Attempt to view it's disk/memory usage when the instance is running vs when it is stopped.
- Review the API responses from the
/1.0/instances?recursion=2
endpoints.
Or
- Call the API on a running or stopped instance.
Information to attach
- [ ] Any relevant kernel output (
dmesg
) - [ ] Container log (
lxc info NAME --show-log
) - [ ] Container configuration (
lxc config show NAME --expanded
) - [ ] Main daemon log (at /var/log/lxd/lxd.log or /var/snap/lxd/common/lxd/logs/lxd.log)
- [ ] Output of the client with --debug
- [ ] Output of the daemon with --debug (alternatively output of
lxc monitor
while reproducing the issue)