tools icon indicating copy to clipboard operation
tools copied to clipboard

[Proposal] Adding a grouping column to the sample sheet

Open TheodoreMarkulin opened this issue 3 years ago • 0 comments

Description of feature

What is being proposed?

Currently the base sample sheet template uses: sample, fastq_1, fastq_2, and single_end. I would like to propose adding a 5th column called group to the sample sheet.

What does this solve?

A single sample may be run on the same experiment multiple times under different conditions. The current method for remedying this (within the validate_unique_samples function) is to append a _T# increment to the end of each sample name that appears more than once. https://github.com/nf-core/tools/blob/171127bd850040e76febd0945e6980b7afcaad69/nf_core/pipeline-template/bin/check_samplesheet.py#L128-L129

By adding a grouping column, identical samples belonging to the same group can be modified by appending the group name instead of a _T#.

The main reason I'm proposing a grouping column though is for downstream analysis. Should someone want to integrate differential analysis into a copy of a pipeline, they need to come up with a process to feed in the grouping information outside of the assets already provided to the pipeline. Adding this sort of column would allow for a natural path to integrating such analysis. Furthermore, being able to group samples (groupby group, sample name) allows for easily integrating other processes, such as FASTQ concatenation.

TheodoreMarkulin avatar May 19 '22 15:05 TheodoreMarkulin