metacat icon indicating copy to clipboard operation
metacat copied to clipboard

Metacat backend storage API

Open gothub opened this issue 5 years ago • 2 comments

An alternative to the current REST API for ingesting/exporting files to/from Metacat is needed, especially for large files. This storage API:

  • would decouple sending a file and it's sysmeta, so that the file could be 'registered' after being sent
  • should handle authentication
  • have a pluggable architecture so that different storage systems could send/receive data, such as Amazon S3, Globus, dropbox, google drive, Gluster, etc.

This backend storage model could be expanded such that instead of just handling ingest and export of data into/out of Metacat, all file storage is handled, such that the data storage is remote from the Metacat instance.

gothub avatar Apr 16 '20 23:04 gothub

Some points to consider in designing this API:

  • what are the required capabilities?
    • ingest of files directly to metacat storage (not using HTTP REST API)
      • would authorization be session based, like DataONE sessions?
      • could Ceph authentication be helpful
        • https://docs.ceph.com/docs/cuttlefish/rados/operations/auth-intro/
      • how would system metadata be generated?
    • file download/access
  • possible connection mechanisms
    • Ceph FUSE: https://docs.ceph.com/docs/master/man/8/ceph-fuse/
    • Ceph Object Gateway: https://docs.ceph.com/docs/master/radosgw/ (REST based)

Note: the ESS-DIVE team has this API listed on their 'priority list' which @vchendrix indicates is currently undergoing review

It would be helpful to get input from them regarding their requirements and use cases for this API.

gothub avatar Aug 25 '20 16:08 gothub

This relies on hashstore being added - moving to 3.1.0

artntek avatar Feb 08 '24 16:02 artntek