scale icon indicating copy to clipboard operation
scale copied to clipboard

metadata metadata

Open cshamis opened this issue 5 years ago • 1 comments

Open for discussion: We may have a problem with how we're handling file metadata in SCALE/SEED. The input_file_manifest.json includes all the file metatdata scale knows about the input files, but ... in the case of a mounted S3 workspace, it cannot be written and will cause an error. Moreover ALL input file metadata records are stored in that ONE manifest file. Pushing parse-results back into the scale file records requires writing a metadata.json in the output directory using a different JSON format.

What we really want:

  • We need a way for a job to READ input file metadata.
  • We need a way for a job to UPDATE input file metadata.
  • We need a way for a job to CREATE output file metadata when it creates a new output file.
  • It would be desirable for this mechanism to be the SAME for all three.

The proposal:

  • All metadata is provided via file objects and follows the {filename}.metadata.json naming convention.
  • json format will include everything in the scale_file record, but only certain fields can be modified. (certain scale-internals will not be changeable: i.e. filesize, filename, creation time, transfer time, permanlink, strike_id, workspace_id, etc.)
  • Input files go into top level container directory /scale/input as they do now.
  • Input metadata files will go into a newly created toplevel directory in the container called "/scale/metadata" (solves the unreliable write ability of the workspace problem)
  • Output files go into top level container directory /scale/output as they do now.
  • Output metatdata files will go into the /scale/output directory as required by SEED spec.

Right now, I don't think we have a way for a parse job to update the scale file record at all.

I'm leaning towards this being necessary for 7.0.x ; god help me.

cshamis avatar Aug 22 '19 20:08 cshamis

This is related to #1289, but is a superset of that issue. This is something we should keep as context for future designs, but is out of scope for the immediate need. #1289 will address the immediate need.

gisjedi avatar Oct 16 '19 15:10 gisjedi