backend.ai
backend.ai copied to clipboard
Advanced vfolder deletion features (aka trash bin)
Main idea
We can now determine the status of vfolders after #713.
Let's extend this feature for advanced deletion status such as DELETE-PENDING
, DELETE-ONGOING
or DELETE-ERROR
.
- We can reserve what time to delete a vfolder by giving an option with a default value, such as 1 day or 1 week etc. This job could be done by a new event and event handlers.
- We can implement "Trash bin", where we can restore
DELETE-PENDING
vfolders by the owner of the vfolder or any admin.
It is recommended that deletion job change the status of vfolder and save the result of job at audit log rather purge the row in DB.
Updated (2023-10-04)
In 24.03, vfolder trash bin feature is implemented. Now vfolder delete API just update the status of vfolder to DELETE_COMPLETE
.
We need to implement a trash bin timer which periodically purges long-lasting VFolders in trash bin as @kyujin-cho mentioned.
Updated (2024-02-06)
We will refine the state transition as follows, with better namings:
flowchart TB
A(READY) -->|delete from vfolder list| B(DELETE_PENDING)
B -->|restore from trash bin| A
B -->|delete from trash bin, triggering actual filesystem removal| C(DELETE_ONGOING)
C -->|removal in filesystem succeeds| D(DELETE_COMPLETE)
C -->|removal in filesystem fails| E(DELETE_ERROR)
D -->|"automatic <code>clean-history</code> or manual cleanup"| X(end of lifecycle)
E -->|manual cleanup| X
- Remove the
PURGE_ONGOING
state and perform the actual filesystem-level removal inDELETE_ONGOING
. - The normal vfolder list should display
READY
vfolders only. - The trash bin should display
DELETE_PENDING
andDELETE_ONGOING
vfolders only.-
DELETE_PENDING
: Users can no longer mount this vfolder when creating new sessions.
-
-
DELETE_COMPLETE
vfolders are not shown in the WebUI but only in the CLI and the control panel.- Currently the enum value is "deleted-complete". The typo should be fixed with an explicit migration.
- Let's introduce the
status_history
column in thevfolders
table, just like #1662.
Before:
class VFolderOperationStatus(str, enum.Enum):
"""
Introduce virtual folder current status for storage-proxy operations.
"""
READY = "ready"
PERFORMING = "performing"
CLONING = "cloning"
MOUNTED = "mounted"
ERROR = "error"
DELETE_ONGOING = "delete-ongoing" # vfolder is being deleted
DELETE_COMPLETE = "deleted-complete" # vfolder is deleted
PURGE_ONGOING = "purge-ongoing" # vfolder is being removed permanently
After:
class VFolderOperationStatus(enum.StrEnum): # use Python 3.11's StrEnum
"""
Introduce virtual folder current status for storage-proxy operations.
"""
READY = "ready"
PERFORMING = "performing"
CLONING = "cloning"
# MOUNTED: Tracking of the mount status should be done in a separate table.
# ERROR: It should be defined for each different operations. e.g., DELETE_ERROR, CLONING_ERROR, etc.
DELETE_PENDING = "delete-pending" # new state for trash bin
DELETE_ONGOING = "delete-ongoing" # vfolder is being deleted in the storage proxy
DELETE_COMPLETE = "delete-complete" # fix the typo
DELETE_ERROR = "delete-error" # explicit failure state
Additional technical considerations
-
storage-proxy
- Limit the number of concurrent vfolder removal operations in the filesystem.
- When requested vfolder removal twice or more times for a same vfolder, continue the first operation without spawning multiple removal jobs.
- Report the completion of vfolder removal to the manager in an asynchronous way. (via the event bus?)
- When cancelled due to shutdown of the service in the middle of operations, gracefully cease the operation. This cancellation shouldn't be treated as
DELETE_ERROR
but the state must be keptDELETE_ONGOING
for continuation after restart.- In the future, we could introduce the reconcilation loop design here.
-
manager
- When the manager is restarted, restart the delete operations against
DELETE_ONGOING
vfolders. It should be safe to make multiple duplicate removal requests to the same vfolder. - Extend
clean-history
implementation to "purge" the vfolders in theDELETE_COMPLETE
state.
- When the manager is restarted, restart the delete operations against
### Tasks
- [ ] https://github.com/lablup/backend.ai/pull/835
- [ ] https://github.com/lablup/backend.ai/pull/1892
- [ ] https://github.com/lablup/backend.ai/pull/1936
- [ ] https://github.com/lablup/backend.ai/issues/1905
- [ ] https://github.com/lablup/backend.ai/pull/1884
- [ ] https://github.com/lablup/backend.ai/issues/1797
Let's implement this first.
We can implement "Trash bin", where we can restore DELETE-PENDING vfolders by the owner of the vfolder or any admin. And the path would be changed when receiving
DELETE
request. e.g.<vfhost>/12/23/456...
-><vfhoost>/.trash/12/23/456...
Most of the essential features will be covered by #835 but we still need to implement a timer which periodically purges long-lasting VFolders in trash bin.
@agatha197 @lizable @yomybaby Need vfolder purge support in webui, and delete API will be soft-delete as we implemented trash bin since 24.03 version.