workflows
workflows copied to clipboard
Bioinformatics workflows developed for and used on the St. Jude Cloud project.
It's a relatively common standard in the bioinformatics tools we wrap to not have any special handling for empty inputs or outputs (including headered files without content/alignments). These "empty" files...
Add trailing commas. Add blank lines between elements.
Some of our tasks allocate a large static amount of RAM that is often an over-allocation for many inputs. One example here: https://github.com/stjudecloud/workflows/blob/main/tools/ngsderive.wdl#L366
Major culprit here is HTSEQ: https://github.com/stjudecloud/workflows/blob/main/tools/htseq.wdl It has a pretty terrible sort algorithm and eats up resources when the input is position sorted. We've exposed the name sort option but...
See here: https://github.com/stjudecloud/workflows/blob/main/tools/fastqc.wdl#L59 The above may not work if `prefix` is messed with.
In the early days of this repo we tended to only expose parameters we use. We've since gotten much better at exposing parameters as we add tools. But there are...
I think it's appropriate to default to gzipped inputs and that should be the standard we support. I don't see a need to go out of our way to support...
All of our workflows and (almost) all of our tasks assume that data is Paired-End. SE support would make our workflows and tools more accessible.