cloudstack icon indicating copy to clipboard operation
cloudstack copied to clipboard

New API "checkVolume" to check and repair any leaks or issues reported by qemu-img check

Open harikrishna-patnala opened this issue 1 year ago • 41 comments

Description

This PR introduces a new API "checkVolume" that allows users or admins to check and repair if any leaks observed. Currently this is supported only for KVM

Doc PR link : https://github.com/apache/cloudstack-documentation/pull/380

There are few cases when VMs shutdown uncleanly, particularly those using qcow2, they can leak clusters. This may sometimes lead to volumes taking up much more space than they are supposed to. When we use qcow2 format to thin provision, and the volume size is pretty close to the actual formatted size, leaked clusters can run us out of space, so we need a way to check/repair.

To address this, we have introduced a new API "checkVolume" API which takes parameters volume id and repair (possible values are leaks/all)

API name: checkVolume Parameters:

  • id : volume ID
  • repair : parameter to repair the volume, leaks or all are the possible values

There is also option to repair the volume during VM start or while attaching the volume to VM. Introduced a new boolean global setting volume.check.and.repair.leaks.before.use with a default false.

STEPS TO REPRODUCE:

  1. Create a VM on local storage, or NFS storage.

  2. attach a data disk

  3. run a write benchmark on data disk in guest. e.g.: fio --filename=/dev/vdb --direct=1 --rw=randwrite --bsrange=512-4k --ioengine=libaio --iodepth=32 --runtime=120 --numjobs=8 --time_based --group_reporting --name=iops-test-job --norandommap

  4. immediately kill the VM (from host try virsh shutdown or "kill -9" of qemu process

  5. run a check on the underlying qcow2 file, observe "leaks" count

# qemu-img check /var/lib/libvirt/images/26be20c7-b9d0-43f6-a76e-16c70737a0e0 --output=json 2>/dev/null
{
    "image-end-offset": 6442582016,
    "total-clusters": 163840,
    "check-errors": 0,
    "leaks": 124,
    "allocated-clusters": 98154,
    "filename": "/var/lib/libvirt/images/26be20c7-b9d0-43f6-a76e-16c70737a0e0",
    "format": "qcow2",
    "fragmented-clusters": 96135
}
  1. repair leaks
# qemu-img check /var/lib/libvirt/images/26be20c7-b9d0-43f6-a76e-16c70737a0e0 --output=json -r leaks 2>/dev/null
{
    "image-end-offset": 6442582016,
    "total-clusters": 163840,
    "check-errors": 0,
    "leaks-fixed": 124,
    "allocated-clusters": 98154,
    "filename": "/var/lib/libvirt/images/26be20c7-b9d0-43f6-a76e-16c70737a0e0",
    "format": "qcow2",
    "fragmented-clusters": 96135
}

Types of changes

  • [ ] Breaking change (fix or feature that would cause existing functionality to change)
  • [x] New feature (non-breaking change which adds functionality)
  • [ ] Bug fix (non-breaking change which fixes an issue)
  • [ ] Enhancement (improves an existing feature and functionality)
  • [ ] Cleanup (Code refactoring and cleanup, that may add test cases)
  • [ ] build/CI

Feature/Enhancement Scale or Bug Severity

Feature/Enhancement Scale

  • [x] Major
  • [ ] Minor

Screenshots (if appropriate):

How Has This Been Tested?

(localcloud) 🐱 > check volume id=55937826-2f08-414a-9eef-4c6b7d6fd3b1 { . . "volumecheckresult": { "allocated-clusters": "110", "check-errors": "0", "leaks": 73, "filename": "/mnt/e72364b6-eab0-369f-af0b-2ec8bed9d8ac/55937826-2f08-414a-9eef-4c6b7d6fd3b1", "format": "qcow2", "fragmented-clusters": "32", "image-end-offset": "7995392", "total-clusters": "131072" },

(localcloud) 🐱 > check volume id=55937826-2f08-414a-9eef-4c6b7d6fd3b1 repair=leaks { "volumecheckresult": { "allocated-clusters": "110", "check-errors": "0", "leaks": 73, "filename": "/mnt/e72364b6-eab0-369f-af0b-2ec8bed9d8ac/55937826-2f08-414a-9eef-4c6b7d6fd3b1", "format": "qcow2", "fragmented-clusters": "32", "image-end-offset": "7995392", "total-clusters": "131072" }, "volumerepairresult": { "allocated-clusters": "110", "check-errors": "0", "leaks-fixed": 73, "filename": "/mnt/e72364b6-eab0-369f-af0b-2ec8bed9d8ac/55937826-2f08-414a-9eef-4c6b7d6fd3b1", "format": "qcow2", "fragmented-clusters": "32", "image-end-offset": "7995392", "total-clusters": "131072" }, }

How did you try to break this feature and the system with this change?

harikrishna-patnala avatar Jan 30 '24 13:01 harikrishna-patnala

@blueorangutan package

harikrishna-patnala avatar Jan 30 '24 13:01 harikrishna-patnala

Codecov Report

Attention: 298 lines in your changes are missing coverage. Please review.

Comparison is base (1a11311) 30.90% compared to head (30fa612) 30.91%.

Files Patch % Lines
...n/java/com/cloud/storage/VolumeApiServiceImpl.java 33.96% 69 Missing and 1 partial :warning:
...s/src/main/java/com/cloud/utils/script/Script.java 0.00% 69 Missing :warning:
...per/LibvirtCheckAndRepairVolumeCommandWrapper.java 36.78% 48 Missing and 7 partials :warning:
...i/command/user/volume/CheckAndRepairVolumeCmd.java 3.03% 32 Missing :warning:
...java/org/apache/cloudstack/utils/qemu/QemuImg.java 0.00% 18 Missing :warning:
...e/cloudstack/storage/volume/VolumeServiceImpl.java 57.57% 10 Missing and 4 partials :warning:
...ils/src/main/java/com/cloud/utils/StringUtils.java 0.00% 11 Missing :warning:
.../agent/api/storage/CheckAndRepairVolumeAnswer.java 42.85% 8 Missing :warning:
...agent/api/storage/CheckAndRepairVolumeCommand.java 52.94% 7 Missing and 1 partial :warning:
.../java/com/cloud/vm/VmWorkCheckAndRepairVolume.java 0.00% 6 Missing :warning:
... and 2 more
Additional details and impacted files
@@            Coverage Diff             @@
##               4.19    #8577    +/-   ##
==========================================
  Coverage     30.90%   30.91%            
- Complexity    34202    34252    +50     
==========================================
  Files          5347     5353     +6     
  Lines        375621   376032   +411     
  Branches      54627    54684    +57     
==========================================
+ Hits         116093   116249   +156     
- Misses       244240   244490   +250     
- Partials      15288    15293     +5     
Flag Coverage Δ
simulator-marvin-tests 24.73% <4.07%> (-0.02%) :arrow_down:
uitests 4.39% <ø> (ø)
unit-tests 16.58% <24.94%> (+0.01%) :arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Jan 30 '24 13:01 codecov[bot]

@blueorangutan package

harikrishna-patnala avatar Jan 31 '24 04:01 harikrishna-patnala

@harikrishna-patnala a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

blueorangutan avatar Jan 31 '24 04:01 blueorangutan

Packaging result [SF]: ✔️ el7 ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 8476

blueorangutan avatar Jan 31 '24 05:01 blueorangutan

@blueorangutan package

harikrishna-patnala avatar Jan 31 '24 07:01 harikrishna-patnala

@harikrishna-patnala a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

blueorangutan avatar Jan 31 '24 07:01 blueorangutan

Packaging result [SF]: ✔️ el7 ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 8478

blueorangutan avatar Jan 31 '24 08:01 blueorangutan

@blueorangutan package

harikrishna-patnala avatar Feb 01 '24 06:02 harikrishna-patnala

@harikrishna-patnala a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

blueorangutan avatar Feb 01 '24 07:02 blueorangutan

Packaging result [SF]: ✔️ el7 ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 8487

blueorangutan avatar Feb 01 '24 08:02 blueorangutan

@harikrishna-patnala I might be missing something, but how will this new API be handled when called in a xen or vmware env? as you state "Currently this is supported only for KVM" I am sure you implemented this somewhere, but all I can find is an implicit error and no graceful message

see https://github.com/apache/cloudstack/pull/8577/files#diff-63d1a7ba0fc6bbd393feffaaf961d192431e8d343280b7cdc8d3817c2d6b7f1cR2801

DaanHoogland avatar Feb 01 '24 12:02 DaanHoogland

@harikrishna-patnala I might be missing something, but how will this new API be handled when called in a xen or vmware env? as you state "Currently this is supported only for KVM" I am sure you implemented this somewhere, but all I can find is an implicit error and no graceful message

see https://github.com/apache/cloudstack/pull/8577/files#diff-63d1a7ba0fc6bbd393feffaaf961d192431e8d343280b7cdc8d3817c2d6b7f1cR2801

Kept a check in the validate method @DaanHoogland https://github.com/shapeblue/cloudstack/blob/353b2f4f0d938f9fab91c42358859a430dcc1c14/server/src/main/java/com/cloud/storage/VolumeApiServiceImpl.java#L1909-L1911

is this what you are trying to refer ?

harikrishna-patnala avatar Feb 01 '24 12:02 harikrishna-patnala

Kept a check in the validate method @DaanHoogland https://github.com/shapeblue/cloudstack/blob/353b2f4f0d938f9fab91c42358859a430dcc1c14/server/src/main/java/com/cloud/storage/VolumeApiServiceImpl.java#L1909-L1911

is this what you are trying to refer ?

yes, thanks @harikrishna-patnala , totaly read over that.

DaanHoogland avatar Feb 02 '24 09:02 DaanHoogland

@blueorangutan package

harikrishna-patnala avatar Feb 05 '24 08:02 harikrishna-patnala

@harikrishna-patnala a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

blueorangutan avatar Feb 05 '24 08:02 blueorangutan

Packaging result [SF]: ✔️ el7 ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 8517

blueorangutan avatar Feb 05 '24 10:02 blueorangutan

@blueorangutan package

harikrishna-patnala avatar Feb 07 '24 09:02 harikrishna-patnala

@harikrishna-patnala a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

blueorangutan avatar Feb 07 '24 09:02 blueorangutan

@blueorangutan package

harikrishna-patnala avatar Feb 08 '24 11:02 harikrishna-patnala

@harikrishna-patnala a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

blueorangutan avatar Feb 08 '24 11:02 blueorangutan

Packaging result [SF]: ✔️ el7 ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 8586

blueorangutan avatar Feb 08 '24 12:02 blueorangutan

@blueorangutan package

rohityadavcloud avatar Feb 20 '24 06:02 rohityadavcloud

@rohityadavcloud a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

blueorangutan avatar Feb 20 '24 06:02 blueorangutan

Packaging result [SF]: ✖️ el7 ✖️ el8 ✖️ el9 ✖️ debian ✖️ suse15. SL-JID 8719

blueorangutan avatar Feb 20 '24 06:02 blueorangutan

@blueorangutan package

harikrishna-patnala avatar Feb 20 '24 07:02 harikrishna-patnala

This pull request has merge conflicts. Dear author, please fix the conflicts and sync your branch with the base branch.

github-actions[bot] avatar Feb 21 '24 04:02 github-actions[bot]

@blueorangutan package

harikrishna-patnala avatar Feb 21 '24 04:02 harikrishna-patnala

@harikrishna-patnala a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

blueorangutan avatar Feb 21 '24 04:02 blueorangutan

Packaging result [SF]: ✔️ el7 ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 8729

blueorangutan avatar Feb 21 '24 06:02 blueorangutan