backend.ai icon indicating copy to clipboard operation
backend.ai copied to clipboard

Advanced vfolder deletion features (aka trash bin)

Open fregataa opened this issue 2 years ago • 3 comments

Main idea

We can now determine the status of vfolders after #713. Let's extend this feature for advanced deletion status such as DELETE-PENDING, DELETE-ONGOING or DELETE-ERROR.

  • We can reserve what time to delete a vfolder by giving an option with a default value, such as 1 day or 1 week etc. This job could be done by a new event and event handlers.
  • We can implement "Trash bin", where we can restore DELETE-PENDING vfolders by the owner of the vfolder or any admin.

It is recommended that deletion job change the status of vfolder and save the result of job at audit log rather purge the row in DB.


Updated (2023-10-04)

In 24.03, vfolder trash bin feature is implemented. Now vfolder delete API just update the status of vfolder to DELETE_COMPLETE. We need to implement a trash bin timer which periodically purges long-lasting VFolders in trash bin as @kyujin-cho mentioned.


Updated (2024-02-06)

We will refine the state transition as follows, with better namings:

flowchart TB
    A(READY) -->|delete from vfolder list| B(DELETE_PENDING)
    B -->|restore from trash bin| A
    B -->|delete from trash bin, triggering actual filesystem removal| C(DELETE_ONGOING)
    C -->|removal in filesystem succeeds| D(DELETE_COMPLETE)
    C -->|removal in filesystem fails| E(DELETE_ERROR)
    D -->|"automatic <code>clean-history</code> or manual cleanup"| X(end of lifecycle)
    E -->|manual cleanup| X
  • Remove the PURGE_ONGOING state and perform the actual filesystem-level removal in DELETE_ONGOING.
  • The normal vfolder list should display READY vfolders only.
  • The trash bin should display DELETE_PENDING and DELETE_ONGOING vfolders only.
    • DELETE_PENDING: Users can no longer mount this vfolder when creating new sessions.
  • DELETE_COMPLETE vfolders are not shown in the WebUI but only in the CLI and the control panel.
    • Currently the enum value is "deleted-complete". The typo should be fixed with an explicit migration.
  • Let's introduce the status_history column in the vfolders table, just like #1662.

Before:

class VFolderOperationStatus(str, enum.Enum):
    """
    Introduce virtual folder current status for storage-proxy operations.
    """

    READY = "ready"
    PERFORMING = "performing"
    CLONING = "cloning"
    MOUNTED = "mounted"
    ERROR = "error"
    DELETE_ONGOING = "delete-ongoing"  # vfolder is being deleted
    DELETE_COMPLETE = "deleted-complete"  # vfolder is deleted
    PURGE_ONGOING = "purge-ongoing"  # vfolder is being removed permanently

After:

class VFolderOperationStatus(enum.StrEnum):  # use Python 3.11's StrEnum
    """
    Introduce virtual folder current status for storage-proxy operations.
    """

    READY = "ready"
    PERFORMING = "performing"
    CLONING = "cloning"
    # MOUNTED: Tracking of the mount status should be done in a separate table.
    # ERROR: It should be defined for each different operations. e.g., DELETE_ERROR, CLONING_ERROR, etc.
    DELETE_PENDING = "delete-pending"  # new state for trash bin
    DELETE_ONGOING = "delete-ongoing"  # vfolder is being deleted in the storage proxy
    DELETE_COMPLETE = "delete-complete"  # fix the typo
    DELETE_ERROR = "delete-error"  # explicit failure state

Additional technical considerations

  • storage-proxy
    • Limit the number of concurrent vfolder removal operations in the filesystem.
    • When requested vfolder removal twice or more times for a same vfolder, continue the first operation without spawning multiple removal jobs.
    • Report the completion of vfolder removal to the manager in an asynchronous way. (via the event bus?)
    • When cancelled due to shutdown of the service in the middle of operations, gracefully cease the operation. This cancellation shouldn't be treated as DELETE_ERROR but the state must be kept DELETE_ONGOING for continuation after restart.
      • In the future, we could introduce the reconcilation loop design here.
  • manager
    • When the manager is restarted, restart the delete operations against DELETE_ONGOING vfolders. It should be safe to make multiple duplicate removal requests to the same vfolder.
    • Extend clean-history implementation to "purge" the vfolders in the DELETE_COMPLETE state.
### Tasks
- [ ] https://github.com/lablup/backend.ai/pull/835
- [ ] https://github.com/lablup/backend.ai/pull/1892
- [ ] https://github.com/lablup/backend.ai/pull/1936
- [ ] https://github.com/lablup/backend.ai/issues/1905
- [ ] https://github.com/lablup/backend.ai/pull/1884
- [ ] https://github.com/lablup/backend.ai/issues/1797

fregataa avatar Oct 03 '22 18:10 fregataa

Let's implement this first.

We can implement "Trash bin", where we can restore DELETE-PENDING vfolders by the owner of the vfolder or any admin. And the path would be changed when receiving DELETE request. e.g. <vfhost>/12/23/456... -> <vfhoost>/.trash/12/23/456...

lizable avatar Oct 24 '22 02:10 lizable

Most of the essential features will be covered by #835 but we still need to implement a timer which periodically purges long-lasting VFolders in trash bin.

kyujin-cho avatar Oct 04 '23 05:10 kyujin-cho

@agatha197 @lizable @yomybaby Need vfolder purge support in webui, and delete API will be soft-delete as we implemented trash bin since 24.03 version.

fregataa avatar Oct 04 '23 07:10 fregataa