modules icon indicating copy to clipboard operation
modules copied to clipboard

Add EDAM ontologies for input and output channels in `meta.yml`

Open ewels opened this issue 1 year ago • 1 comments

The EDAM ontology (https://edamontology.org) is a well established set of of ontology keywords for use in bioinformatics.

It would be great to have EDAM identifiers associated with channel inputs and outputs, to have a rich identification of the type of data that they contain. This is more extensive than just filename, for example there are multiple ontologies for different flavours of .fasta files.

It should be possible to have multiple ontologies associated. We should also future-proof to allow multiple ontologies in the future. Example syntax:

tools:
  - mytool:
    - ontologies:
      - edam: http://edamontology.org/format_1929
      - edam: …

This addition should also come with CLI helper functionality to make it easy for developers to search for terms and select them via nf-core/tools.

Once #5830 is complete, we should be able to fetch tool EDAM ontologies to generate a shortlist to pick from.

ewels avatar Jun 19 '24 06:06 ewels

I’m not sure this really makes sense for individual modules because they often can be used in many contexts so the amount of metadata you can specify upfront is probably not going to be very helpful? For example samtools - the input is a bam but actually you can’t say upfront any more than that - anything could be in that bam. Definitely see the utility for pipelines though.

CharlotteAnne avatar Jun 19 '24 22:06 CharlotteAnne