browsertrix icon indicating copy to clipboard operation
browsertrix copied to clipboard

Custom S3 Buckets for Orgs

Open ldko opened this issue 2 years ago • 2 comments

I have been trying out local deployments of Browsertrix Cloud with microk8s and would find it helpful if I could configure a local storage path to where WACZ/WARCs are written for crawls, so I can put them in a location dedicated for storage rather than having it all go to the same place as where the microk8s data is generally being stored.

ldko avatar Feb 07 '23 18:02 ldko

This is planned :) We'll use this ticket to track the ability to configure the storage location on the back end on a per-org basis with an enforceable default set by the server admin.

Requirements

  • Main storage path and backup storage path can be changed to other S3 compatible locations in the org settings
    • The backup path must be set to something. If nothing is set it will use the Webrecorder default.
    • Users can set any number of backup paths
    • When a user changes their main storage path or backup storage path, their data will move to the new bucket.
  • Org Quotas superadmin panel can enable or disable the ability to set custom bucket locations

Shrinks99 avatar Feb 07 '23 19:02 Shrinks99

Tasks:

  • [x] Ensure code to add and remove custom storages works as expected
  • [x] Add ability to set custom storage as primary and/or replica storage locations (ensure there's always one replica location set if any are configured, use default if org custom replica storage isn't set)
  • [x] Ensure downloads and uploads with custom storage work as expected
  • [ ] Add tests
  • [ ] Add documentation
  • [x] Add background job to move files from existing S3 bucket to new S3 bucket and update database accordingly (by modifying prefix)
    • start job when primary storage is changed, seting org to read-only and wait for crawls to complete before kicking this off
    • when replica location is added, don't set to read-only but instead start background jobs to replicate all files to new replica location

In the initial pass adding/removing custom storages and changing primary or replica storage on an org will be done through superadmin-only API endpoints. In a future iteration we could add an admin UI for this similar to org quota and proxy settings.

tw4l avatar Sep 19 '24 15:09 tw4l