chipseq
chipseq copied to clipboard
Provide replicate information explicitly in samplesheet
Description of feature
Currently, the pipeline considers as a biological replicate any sample which has the same id under the sample
column of the samplesheet followed by a different suffix determined by an underscore e.g.:
sample1_r1
sample1_r2
sample2_r1
sample2_r2
This information is used by the pipeline in this code line to determine whether multiple groups are present e,g, sample1
and sample2
in the example above and whether replicates exists r1
and r2
also using the example above.
However, the problem with this approach is that is based on the sample names and sometimes this can be problematic since depends on the correct naming of the replicates with the underscore, see this issue.
I guess that the solution to this problem will be to include again the replicate
column into the samplesheet, although this information is currently only used for enabling the run of DESEQ2_QC
here and MACS2_CONSENSUS
here.
I would like to know your opinion here @drpatelh, @bjlang and any other willing to give feedback of course :smi
Actually, I just remembered that for the IDR analysis the replicate information would be needed in case this feature is implemented, see #235 and #87
I'm not sure if there has been any input on this, but we have several ChIP-Seq projects that have biological reps, so having some way to keep track of these and perform IDR would be great.
In the short term, couldn't replicates be captured when checking the sample sheet, then used downstream? Around this spot:
https://github.com/nf-core/chipseq/blob/51eba00b32885c4d0bec60db3cb0a45eb61e34c5/bin/check_samplesheet.py#L80
The check_samplesheet.py
script has been already updated to get this information in dev
in #349 but the IDR is not yet implemented.
The
check_samplesheet.py
script has been already updated to get this information indev
in #349 but the IDR is not yet implemented.
Yep, missed that. I definitely like having the explicit column for this better than the _r1
, _r2
convention.