nf-prov icon indicating copy to clipboard operation
nf-prov copied to clipboard

Review BCO missing fields

Open bentsherman opened this issue 9 months ago • 0 comments

Summary of the BCO fields that are missing or incomplete:

  • provenance_domain
    • review
    • derived_from
    • obsolete_after
    • embargo
    • contributors (affiliation, email, orcid)
    • license
  • usability_domain
  • description_domain
    • keywords
    • xref
    • pipeline_steps (version, prerequisite)
  • execution_domain
    • external_data_endpoints
    • environment_variables
  • error_domain
    • empirical_error
    • algorithmic_error

See the BCO User Guide for descriptions of these fields.

  • Some fields like usability_domain are free text and seemingly meant to be completed manually.

  • Some fields like review and obsolete_after might be automated by a larger system that can launch pipelines and has the requisite knowledge. nf-prov could act as a pass-through by accepting these fields as config settings.

  • Some fields like license and keywords could probably just be added to the Nextflow manifest config scope

  • Some fields like version and prerequisite could probably be added but might not be worth the effort. For example these fields for tool metadata are implicitly described by the pipeline repo + commit hash, so they aren't really needed long as the git hash is provided.

The BCO manifest produced by nf-prov should always be "valid" against the JSON schema even if it isn't complete. Some missing fields are present but empty. At the end of the day, the user can add any missing details by hand, but it might be better to provide some pass-through config settings so that those manual edits are tracked e.g. in the run history of their workflow platform.

Anyway, just wanted to put this analysis here as a reference for anyone who wants to improve the BCO manifest.

bentsherman avatar Sep 28 '23 04:09 bentsherman