[Proposal] Adding a grouping column to the sample sheet
Description of feature
What is being proposed?
Currently the base sample sheet template uses: sample, fastq_1, fastq_2, and single_end.
I would like to propose adding a 5th column called group to the sample sheet.
What does this solve?
A single sample may be run on the same experiment multiple times under different conditions. The current method for remedying this (within the validate_unique_samples function) is to append a _T# increment to the end of each sample name that appears more than once.
https://github.com/nf-core/tools/blob/171127bd850040e76febd0945e6980b7afcaad69/nf_core/pipeline-template/bin/check_samplesheet.py#L128-L129
By adding a grouping column, identical samples belonging to the same group can be modified by appending the group name instead of a _T#.
The main reason I'm proposing a grouping column though is for downstream analysis. Should someone want to integrate differential analysis into a copy of a pipeline, they need to come up with a process to feed in the grouping information outside of the assets already provided to the pipeline. Adding this sort of column would allow for a natural path to integrating such analysis. Furthermore, being able to group samples (groupby group, sample name) allows for easily integrating other processes, such as FASTQ concatenation.