Metacat backend storage API
An alternative to the current REST API for ingesting/exporting files to/from Metacat is needed, especially for large files. This storage API:
- would decouple sending a file and it's sysmeta, so that the file could be 'registered' after being sent
- should handle authentication
- have a pluggable architecture so that different storage systems could send/receive data, such as Amazon S3, Globus, dropbox, google drive, Gluster, etc.
This backend storage model could be expanded such that instead of just handling ingest and export of data into/out of Metacat, all file storage is handled, such that the data storage is remote from the Metacat instance.
Some points to consider in designing this API:
- what are the required capabilities?
- ingest of files directly to metacat storage (not using HTTP REST API)
- would authorization be session based, like DataONE sessions?
- could Ceph authentication be helpful
- https://docs.ceph.com/docs/cuttlefish/rados/operations/auth-intro/
- how would system metadata be generated?
- file download/access
- ingest of files directly to metacat storage (not using HTTP REST API)
- possible connection mechanisms
- Ceph FUSE: https://docs.ceph.com/docs/master/man/8/ceph-fuse/
- Ceph Object Gateway: https://docs.ceph.com/docs/master/radosgw/ (REST based)
Note: the ESS-DIVE team has this API listed on their 'priority list' which @vchendrix indicates is currently undergoing review
It would be helpful to get input from them regarding their requirements and use cases for this API.
This relies on hashstore being added - moving to 3.1.0