miniwdl
miniwdl copied to clipboard
Secondary / accessory files as implicit inputs
Hello miniwdl team,
I was wondering if there have been discussions around the implementation of inferring secondary/accessory files based on file inputs, similar to CWL's secondaryFiles
option. E.g. in the case of BAM or VCF files, where there is an expectation of an associated index file. In the past, I've just explicitly defined these secondary files as inputs in my WDL documents so they are co-localized with the main input file.
I've found the ongoing discussion regarding the possible spec here and it seems like the discussion is heading toward using structs as file bundles. What are the team's thoughts on the preferred implementation?
Hi @kkchau we haven't gotten out in front of the OpenWDL discussion on this. My opinion- it's common to have the index file as an explicit additional input for tasks and workflows that are only taking one. WDL specifies that all input files are localized/mounted in the same directory except in case of name collisions.
If a workflow is taking an array of BAMs (or whatever) to scatter over, then defining a struct to pair each one with its index is advantageous, so that the operator doesn't need to fuss with building a second array of index files in exactly the right order.
If there are a larger number of related files, like BWA indexes, then it's common to tar them up and pass around the tar file. The WDL version development
has a Directory
type as a new, potentially more-efficient option.
BTW, the OpenWDL slack (including the #miniwdl channel) is a good forum for chatting about things like this