[WIP] Remove allocated snapshots / vm snapshots on start
Description
Some latest active snapshots / vm snapshots are stuck in allocated state when MS is stopped, these are listed / shown in UI as well (not allowed to delete). Remove them on MS start itself.
This PR removes allocated snapshots / vm snapshots on start.
Fixes #8424
Types of changes
- [ ] Breaking change (fix or feature that would cause existing functionality to change)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Bug fix (non-breaking change which fixes an issue)
- [x] Enhancement (improves an existing feature and functionality)
- [ ] Cleanup (Code refactoring and cleanup, that may add test cases)
- [ ] build/CI
Feature/Enhancement Scale or Bug Severity
Feature/Enhancement Scale
- [ ] Major
- [x] Minor
Bug Severity
- [ ] BLOCKER
- [ ] Critical
- [ ] Major
- [ ] Minor
- [ ] Trivial
Screenshots (if appropriate):
How Has This Been Tested?
Manually tested: Take VM Snapshot -> Stop MS -> Start MS
VM Snapshot record after MS stopped =>
id: 1
uuid: 07542708-27b7-47c4-959a-f7f92828f43a
name: i-2-3-VM_VS_20240108075812
display_name: testvm01-snap
description: NULL
vm_id: 3
account_id: 2
domain_id: 1
service_offering_id: 1
vm_snapshot_type: DiskAndMemory
state: Allocated
parent: NULL
current: NULL
update_count: 0
updated: NULL
created: 2024-01-08 07:58:12
removed: NULL
VM Snapshot record after MS started (Not listed / shown in the UI) =>
id: 1
uuid: 07542708-27b7-47c4-959a-f7f92828f43a
name: i-2-3-VM_VS_20240108075812
display_name: testvm01-snap
description: NULL
vm_id: 3
account_id: 2
domain_id: 1
service_offering_id: 1
vm_snapshot_type: DiskAndMemory
state: Allocated
parent: NULL
current: NULL
update_count: 0
updated: NULL
created: 2024-01-08 07:58:12
removed: 2024-01-08 07:59:26
How did you try to break this feature and the system with this change?
@blueorangutan package
@sureshanaparti a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.
Codecov Report
:x: Patch coverage is 1.44928% with 68 lines in your changes missing coverage. Please review.
:white_check_mark: Project coverage is 16.17%. Comparing base (253ac03) to head (56f5195).
:warning: Report is 13 commits behind head on 4.20.
Additional details and impacted files
@@ Coverage Diff @@
## 4.20 #8452 +/- ##
============================================
- Coverage 16.17% 16.17% -0.01%
- Complexity 13297 13298 +1
============================================
Files 5656 5656
Lines 498136 498218 +82
Branches 60432 60451 +19
============================================
+ Hits 80584 80585 +1
- Misses 408584 408659 +75
- Partials 8968 8974 +6
| Flag | Coverage Δ | |
|---|---|---|
| uitests | 4.00% <ø> (+<0.01%) |
:arrow_up: |
| unittests | 17.02% <1.44%> (-0.01%) |
:arrow_down: |
Flags with carried forward coverage won't be shown. Click here to find out more.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
- :package: JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.
@sureshanaparti would it be better to clean the snapshots by storage garbage collector which run periodically ?
Packaging result [SF]: ✔️ el7 ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 8207
@sureshanaparti would it be better to clean the snapshots by storage garbage collector which run periodically ?
@weizhouapache issue is some active snapshots / vm snapshots are left in allocated when MS is stopped, no point in keeping them and wait until storage garbage collector cleans up (what if storage cleanup is disabled?) as these are listed / shown in UI as well (which are not allowed to delete). So, I think, it's better remove them on start itself. Any other thoughts / suggestions?
@sureshanaparti would it be better to clean the snapshots by storage garbage collector which run periodically ?
@weizhouapache issue is some active snapshots / vm snapshots are left in allocated when MS is stopped, no point in keeping them and wait until storage garbage collector cleans up (what if storage cleanup is disabled?) as these are listed / shown in UI as well (which are not allowed to delete). So, I think, it's better remove them on start itself. Any other thoughts / suggestions?
@sureshanaparti the storage garbage collector cleans up the primary and secondary storage, including the snapshot/volumes/templates
// remove snapshots in Error state
List<SnapshotVO> snapshots = _snapshotDao.listAllByStatus(Snapshot.State.Error);
We could add the cleanup of Allocated snapshot and Allocated/Error vm snapshots in the process
IMHO, it is not the right place to cleanup during mgmt server start, as mgmt server might be running for several days.
@sureshanaparti would it be better to clean the snapshots by storage garbage collector which run periodically ?
@weizhouapache issue is some active snapshots / vm snapshots are left in allocated when MS is stopped, no point in keeping them and wait until storage garbage collector cleans up (what if storage cleanup is disabled?) as these are listed / shown in UI as well (which are not allowed to delete). So, I think, it's better remove them on start itself. Any other thoughts / suggestions?
@sureshanaparti the storage garbage collector cleans up the primary and secondary storage, including the snapshot/volumes/templates
// remove snapshots in Error state List<SnapshotVO> snapshots = _snapshotDao.listAllByStatus(Snapshot.State.Error);We could add the cleanup of Allocated snapshot and Allocated/Error vm snapshots in the process
IMHO, it is not the right place to cleanup during mgmt server start, as mgmt server might be running for several days.
@weizhouapache Agreed for the resources in error state (as there might be things to reset / cleanup) and MS is running. snapshot / vm snapshots are in Allocated, mostly for the very recent active snapshot / vm snapshots before MS is stopped, so no need for any cleanup, better to set them as removed (Any event through state transition would still keep it in Allocated). Otherwise, at least have to not list them on UI, or allow explicit deletion from the UI.
Snapshots in destroying state are deleted here. So, I thought, it's better to remove there on MS start. https://github.com/apache/cloudstack/blob/3f9dd4dc07ff40a4e0216014715e4fe37fc46d28/server/src/main/java/com/cloud/storage/snapshot/SnapshotManagerImpl.java#L1519-L1523
@weizhouapache issue is some active snapshots / vm snapshots are left in allocated when MS is stopped, no point in keeping them and wait until storage garbage collector cleans up (what if storage cleanup is disabled?) as these are listed / shown in UI as well (which are not allowed to delete). So, I think, it's better remove them on start itself. Any other thoughts / suggestions?
@sureshanaparti the storage garbage collector cleans up the primary and secondary storage, including the snapshot/volumes/templates
// remove snapshots in Error state List<SnapshotVO> snapshots = _snapshotDao.listAllByStatus(Snapshot.State.Error);We could add the cleanup of Allocated snapshot and Allocated/Error vm snapshots in the process IMHO, it is not the right place to cleanup during mgmt server start, as mgmt server might be running for several days.
@weizhouapache Agreed for the resources in error state (as there might be things to reset / cleanup) and MS is running. snapshot / vm snapshots are in Allocated, mostly for the very recent active snapshot / vm snapshots before MS is stopped, so no need for any cleanup, better to set them as removed (Any event through state transition would still keep it in Allocated). Otherwise, at least have to not list them on UI, or allow explicit deletion from the UI.
Snapshots in destroying state are deleted here. So, I thought, it's better to remove there on MS start.
https://github.com/apache/cloudstack/blob/3f9dd4dc07ff40a4e0216014715e4fe37fc46d28/server/src/main/java/com/cloud/storage/snapshot/SnapshotManagerImpl.java#L1519-L1523
@sureshanaparti the mgmt service is a key service for some users, especially on production. the service is running for several days without stopping. IMHO it is not good to cleanup resource during start ...
(OK with cleaning up resource in both start process and garbage collector)
@weizhouapache issue is some active snapshots / vm snapshots are left in allocated when MS is stopped, no point in keeping them and wait until storage garbage collector cleans up (what if storage cleanup is disabled?) as these are listed / shown in UI as well (which are not allowed to delete). So, I think, it's better remove them on start itself. Any other thoughts / suggestions?
@sureshanaparti the storage garbage collector cleans up the primary and secondary storage, including the snapshot/volumes/templates
// remove snapshots in Error state List<SnapshotVO> snapshots = _snapshotDao.listAllByStatus(Snapshot.State.Error);We could add the cleanup of Allocated snapshot and Allocated/Error vm snapshots in the process IMHO, it is not the right place to cleanup during mgmt server start, as mgmt server might be running for several days.
@weizhouapache Agreed for the resources in error state (as there might be things to reset / cleanup) and MS is running. snapshot / vm snapshots are in Allocated, mostly for the very recent active snapshot / vm snapshots before MS is stopped, so no need for any cleanup, better to set them as removed (Any event through state transition would still keep it in Allocated). Otherwise, at least have to not list them on UI, or allow explicit deletion from the UI. Snapshots in destroying state are deleted here. So, I thought, it's better to remove there on MS start. https://github.com/apache/cloudstack/blob/3f9dd4dc07ff40a4e0216014715e4fe37fc46d28/server/src/main/java/com/cloud/storage/snapshot/SnapshotManagerImpl.java#L1519-L1523
@sureshanaparti the mgmt service is a key service for some users, especially on production. the service is running for several days without stopping. IMHO it is not good to cleanup resource during start ...
(OK with cleaning up resource in both start process and garbage collector)
ok, will check & update. thanks @weizhouapache
This pull request has merge conflicts. Dear author, please fix the conflicts and sync your branch with the base branch.
@sureshanaparti , does this need rebasing?
on another note, is it maybe safe to first mark it as State == Error and then delete? (I think this is the same @weizhouapache suggests)
@sureshanaparti , any progress on this?
@blueorangutan package
@DaanHoogland a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.
Packaging result [SF]: ✖️ el8 ✖️ el9 ✖️ debian ✖️ suse15. SL-JID 12329
@sureshanaparti are you working on this?
@blueorangutan package
@sureshanaparti a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.
Packaging result [SF]: ✔️ el8 ✔️ el9 ✖️ debian ✔️ suse15. SL-JID 14733
@blueorangutan package
@weizhouapache a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.
Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ el10 ✔️ debian ✔️ suse15. SL-JID 14904
is it ready for review/testing @sureshanaparti ?
is it ready for review/testing @sureshanaparti ?
yes @weizhouapache
This pull request has merge conflicts. Dear author, please fix the conflicts and sync your branch with the base branch.
@blueorangutan package
@sureshanaparti a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.
Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ el10 ✔️ debian ✔️ suse15. SL-JID 14985