docs icon indicating copy to clipboard operation
docs copied to clipboard

Best-paractice of cross-workflow specification of files

Open SilasK opened this issue 2 years ago • 3 comments

I would like to discuss what is the best way to specify files in a way that they can be used across workflows.

Take the example of two workflows e.g

Workflow 1: reads --> assembly

Workflow 2: assembly + reads --> assembly statistics ...

What is the best way to specify the reads and assembly so that they can be used by different workflows? Take into account that Requirement A: The reads might be used at multiple places in Workflow 2. Requirement B : The reads are probably to be used to infer the total number of samples in the target rule.

With sub-workflows, it would be possible to define otherworkflow(file)

But I think the recommended way now is to use modules and to import the rules Workflow 1 and 2 in a new workflow. But then I should know which rules I need to modify to adapt the file specification. This should be necessarily defined in the Readme of a workflow.

I don't see how this can be done without massive modifying many rules of an imported workflow.

Any thoughts?

SilasK avatar May 30 '23 06:05 SilasK

Here's a first attempt:

Workflow 1 input reads are determined by YAML configuration file, and the final assembly file is tagged either in its contents e.g. header lines, or filename; with a hash representing the input reads used to generate it e.g. hash of read hashes.

Workflow 2 takes input reads and input assembly also by YAML configuration file. It checks either on each run or through a dummy output that the input assembly's information about which input reads were used to generate it matches with the set of input reads it was given.

ning-y avatar May 30 '23 20:05 ning-y

Your idea would be to define the path to the files

Something like:

config.yam

read_file_format: "QC/qc_reads/{sample}_{fraction}.fastq.gz"
assembly_file_format: "Assembly/assemblies/{sample}.fasta.gz"

SilasK avatar May 31 '23 09:05 SilasK

One could also use a tsv file in which we will specify the headers in a config file.

Ideally using the https://snakemake.readthedocs.io/en/stable/snakefiles/configuration.html#configuring-scientific-experiments-via-peps

SilasK avatar May 31 '23 09:05 SilasK