browsertrix
browsertrix copied to clipboard
Custom S3 Buckets for Orgs
I have been trying out local deployments of Browsertrix Cloud with microk8s and would find it helpful if I could configure a local storage path to where WACZ/WARCs are written for crawls, so I can put them in a location dedicated for storage rather than having it all go to the same place as where the microk8s data is generally being stored.
This is planned :) We'll use this ticket to track the ability to configure the storage location on the back end on a per-org basis with an enforceable default set by the server admin.
Requirements
- Main storage path and backup storage path can be changed to other S3 compatible locations in the org settings
- The backup path must be set to something. If nothing is set it will use the Webrecorder default.
- Users can set any number of backup paths
- When a user changes their main storage path or backup storage path, their data will move to the new bucket.
- Org Quotas superadmin panel can enable or disable the ability to set custom bucket locations
Tasks:
- [x] Ensure code to add and remove custom storages works as expected
- [x] Add ability to set custom storage as primary and/or replica storage locations (ensure there's always one replica location set if any are configured, use default if org custom replica storage isn't set)
- [x] Ensure downloads and uploads with custom storage work as expected
- [ ] Add tests
- [ ] Add documentation
- [x] Add background job to move files from existing S3 bucket to new S3 bucket and update database accordingly (by modifying prefix)
- start job when primary storage is changed, seting org to read-only and wait for crawls to complete before kicking this off
- when replica location is added, don't set to read-only but instead start background jobs to replicate all files to new replica location
In the initial pass adding/removing custom storages and changing primary or replica storage on an org will be done through superadmin-only API endpoints. In a future iteration we could add an admin UI for this similar to org quota and proxy settings.