physionet-build icon indicating copy to clipboard operation
physionet-build copied to clipboard

Record file structure/details in project metadata

Open tompollard opened this issue 5 months ago • 0 comments

Currently, as far as I'm aware, we don't formally document the structure/details of files in the metadata of published projects.

For example, we have no structured record of details such as:

  • Folder structure
  • Lists of files
  • File types
  • File sizes
  • File contents (e.g. what columns does a CSV file contain)

There may be value in documenting these kind of details. For example, we could refer to the metadata to:

  • Support file-level data discovery.
  • Assist with loading data into appropriate cloud tools (e.g. relational databases)
  • Offer data summaries in the project description

This issue relates to https://github.com/MIT-LCP/physionet-build/issues/2184, which highlights a metadata format for documenting this kind of metadata.

Presumably we would want to generate the metadata around time of publication. The metadata would also need to be easy to regenerate in the rare cases where files are modified post-publication.

tompollard avatar Jan 26 '24 21:01 tompollard