cloudstack
cloudstack copied to clipboard
New API "checkVolume" to check and repair any leaks or issues reported by qemu-img check
Description
This PR introduces a new API "checkVolume" that allows users or admins to check and repair if any leaks observed. Currently this is supported only for KVM
Doc PR link : https://github.com/apache/cloudstack-documentation/pull/380
There are few cases when VMs shutdown uncleanly, particularly those using qcow2, they can leak clusters. This may sometimes lead to volumes taking up much more space than they are supposed to. When we use qcow2 format to thin provision, and the volume size is pretty close to the actual formatted size, leaked clusters can run us out of space, so we need a way to check/repair.
To address this, we have introduced a new API "checkVolume" API which takes parameters volume id and repair (possible values are leaks/all)
API name: checkVolume Parameters:
- id : volume ID
- repair : parameter to repair the volume, leaks or all are the possible values
There is also option to repair the volume during VM start or while attaching the volume to VM. Introduced a new boolean global setting volume.check.and.repair.leaks.before.use with a default false.
STEPS TO REPRODUCE:
-
Create a VM on local storage, or NFS storage.
-
attach a data disk
-
run a write benchmark on data disk in guest. e.g.:
fio --filename=/dev/vdb --direct=1 --rw=randwrite --bsrange=512-4k --ioengine=libaio --iodepth=32 --runtime=120 --numjobs=8 --time_based --group_reporting --name=iops-test-job --norandommap
-
immediately kill the VM (from host try virsh shutdown or "kill -9" of qemu process
-
run a check on the underlying qcow2 file, observe "leaks" count
# qemu-img check /var/lib/libvirt/images/26be20c7-b9d0-43f6-a76e-16c70737a0e0 --output=json 2>/dev/null
{
"image-end-offset": 6442582016,
"total-clusters": 163840,
"check-errors": 0,
"leaks": 124,
"allocated-clusters": 98154,
"filename": "/var/lib/libvirt/images/26be20c7-b9d0-43f6-a76e-16c70737a0e0",
"format": "qcow2",
"fragmented-clusters": 96135
}
- repair leaks
# qemu-img check /var/lib/libvirt/images/26be20c7-b9d0-43f6-a76e-16c70737a0e0 --output=json -r leaks 2>/dev/null
{
"image-end-offset": 6442582016,
"total-clusters": 163840,
"check-errors": 0,
"leaks-fixed": 124,
"allocated-clusters": 98154,
"filename": "/var/lib/libvirt/images/26be20c7-b9d0-43f6-a76e-16c70737a0e0",
"format": "qcow2",
"fragmented-clusters": 96135
}
Types of changes
- [ ] Breaking change (fix or feature that would cause existing functionality to change)
- [x] New feature (non-breaking change which adds functionality)
- [ ] Bug fix (non-breaking change which fixes an issue)
- [ ] Enhancement (improves an existing feature and functionality)
- [ ] Cleanup (Code refactoring and cleanup, that may add test cases)
- [ ] build/CI
Feature/Enhancement Scale or Bug Severity
Feature/Enhancement Scale
- [x] Major
- [ ] Minor
Screenshots (if appropriate):
How Has This Been Tested?
(localcloud) 🐱 > check volume id=55937826-2f08-414a-9eef-4c6b7d6fd3b1 { . . "volumecheckresult": { "allocated-clusters": "110", "check-errors": "0", "leaks": 73, "filename": "/mnt/e72364b6-eab0-369f-af0b-2ec8bed9d8ac/55937826-2f08-414a-9eef-4c6b7d6fd3b1", "format": "qcow2", "fragmented-clusters": "32", "image-end-offset": "7995392", "total-clusters": "131072" },
(localcloud) 🐱 > check volume id=55937826-2f08-414a-9eef-4c6b7d6fd3b1 repair=leaks { "volumecheckresult": { "allocated-clusters": "110", "check-errors": "0", "leaks": 73, "filename": "/mnt/e72364b6-eab0-369f-af0b-2ec8bed9d8ac/55937826-2f08-414a-9eef-4c6b7d6fd3b1", "format": "qcow2", "fragmented-clusters": "32", "image-end-offset": "7995392", "total-clusters": "131072" }, "volumerepairresult": { "allocated-clusters": "110", "check-errors": "0", "leaks-fixed": 73, "filename": "/mnt/e72364b6-eab0-369f-af0b-2ec8bed9d8ac/55937826-2f08-414a-9eef-4c6b7d6fd3b1", "format": "qcow2", "fragmented-clusters": "32", "image-end-offset": "7995392", "total-clusters": "131072" }, }
How did you try to break this feature and the system with this change?
@blueorangutan package
Codecov Report
Attention: 298 lines
in your changes are missing coverage. Please review.
Comparison is base (
1a11311
) 30.90% compared to head (30fa612
) 30.91%.
Additional details and impacted files
@@ Coverage Diff @@
## 4.19 #8577 +/- ##
==========================================
Coverage 30.90% 30.91%
- Complexity 34202 34252 +50
==========================================
Files 5347 5353 +6
Lines 375621 376032 +411
Branches 54627 54684 +57
==========================================
+ Hits 116093 116249 +156
- Misses 244240 244490 +250
- Partials 15288 15293 +5
Flag | Coverage Δ | |
---|---|---|
simulator-marvin-tests | 24.73% <4.07%> (-0.02%) |
:arrow_down: |
uitests | 4.39% <ø> (ø) |
|
unit-tests | 16.58% <24.94%> (+0.01%) |
:arrow_up: |
Flags with carried forward coverage won't be shown. Click here to find out more.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
@blueorangutan package
@harikrishna-patnala a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.
Packaging result [SF]: ✔️ el7 ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 8476
@blueorangutan package
@harikrishna-patnala a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.
Packaging result [SF]: ✔️ el7 ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 8478
@blueorangutan package
@harikrishna-patnala a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.
Packaging result [SF]: ✔️ el7 ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 8487
@harikrishna-patnala I might be missing something, but how will this new API be handled when called in a xen or vmware env? as you state "Currently this is supported only for KVM" I am sure you implemented this somewhere, but all I can find is an implicit error and no graceful message
see https://github.com/apache/cloudstack/pull/8577/files#diff-63d1a7ba0fc6bbd393feffaaf961d192431e8d343280b7cdc8d3817c2d6b7f1cR2801
@harikrishna-patnala I might be missing something, but how will this new API be handled when called in a xen or vmware env? as you state "Currently this is supported only for KVM" I am sure you implemented this somewhere, but all I can find is an implicit error and no graceful message
see https://github.com/apache/cloudstack/pull/8577/files#diff-63d1a7ba0fc6bbd393feffaaf961d192431e8d343280b7cdc8d3817c2d6b7f1cR2801
Kept a check in the validate method @DaanHoogland https://github.com/shapeblue/cloudstack/blob/353b2f4f0d938f9fab91c42358859a430dcc1c14/server/src/main/java/com/cloud/storage/VolumeApiServiceImpl.java#L1909-L1911
is this what you are trying to refer ?
Kept a check in the validate method @DaanHoogland https://github.com/shapeblue/cloudstack/blob/353b2f4f0d938f9fab91c42358859a430dcc1c14/server/src/main/java/com/cloud/storage/VolumeApiServiceImpl.java#L1909-L1911
is this what you are trying to refer ?
yes, thanks @harikrishna-patnala , totaly read over that.
@blueorangutan package
@harikrishna-patnala a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.
Packaging result [SF]: ✔️ el7 ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 8517
@blueorangutan package
@harikrishna-patnala a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.
@blueorangutan package
@harikrishna-patnala a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.
Packaging result [SF]: ✔️ el7 ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 8586
@blueorangutan package
@rohityadavcloud a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.
Packaging result [SF]: ✖️ el7 ✖️ el8 ✖️ el9 ✖️ debian ✖️ suse15. SL-JID 8719
@blueorangutan package
This pull request has merge conflicts. Dear author, please fix the conflicts and sync your branch with the base branch.
@blueorangutan package
@harikrishna-patnala a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.
Packaging result [SF]: ✔️ el7 ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 8729