chipseq icon indicating copy to clipboard operation
chipseq copied to clipboard

File name collision when using 1-treatment replicate vs. >1-control replicates, i.e. cannot specify the same sample,replicate pair for each different control replicate

Open a1ultima opened this issue 7 months ago • 0 comments

Description of the bug

Background

Akin to the exact same error behaviour and outcome to a related issue here: https://github.com/nf-core/chipseq/issues/440#issue-2741780891:

We have metadata from a large scale plant chipseq study called ChipHub, in which there are cases where they run a chipseq pipeline for samples that have a 1-treatment bio replicate -to- many-control bio replicate relationship:

sample,fastq_1,fastq_2,replicate,antibody,control,control_replicate
WT_BCATENIN_IP,BLA203A1_S27_L006_R1_001.fastq.gz,,1,BCATENIN,WT_INPUT,1
WT_BCATENIN_IP,BLA203A1_S27_L006_R1_001.fastq.gz,,1,BCATENIN,WT_INPUT,2
WT_INPUT,BLA203A6_S32_L006_R1_001.fastq.gz,,1,,,
WT_INPUT,BLA203A30_S21_L001_R1_001.fastq.gz,,2,,,

In this minimal example, we accomodate for the case where the sample and replicate values must be flattened (repeated) vs. each different control replicate specified by ChipHub for us to perform peak calling against (i.e. we want to make a comparison of each treatment replicate vs. each different control replicate:

For clarity, we focus just on the treatment rows, annotated in comments as repeat_i=0, and repeat_i=1 respectively:for:

sample,fastq_1,fastq_2,replicate,antibody,control,control_replicate
WT_BCATENIN_IP,BLA203A1_S27_L006_R1_001.fastq.gz,,1,BCATENIN,WT_INPUT,1  // <-- repeat_i=0, 
WT_BCATENIN_IP,BLA203A1_S27_L006_R1_001.fastq.gz,,1,BCATENIN,WT_INPUT,2 //  <-- repeat_i=1, but diff control rep

Is a repeated, in the sense that we want to keep everything equal (sample,replicate,control,antibody), but only differ in which control bio replicate we want to get a comparison against (e.g. for peak calling):

Matching columns

sample,fastq_1,fastq_2,replicate,antibody,control,
WT_BCATENIN_IP,BLA203A1_S27_L006_R1_001.fastq.gz,,1
WT_BCATENIN_IP,BLA203A1_S27_L006_R1_001.fastq.gz,,1

Differing columns:


Command used and terminal output


Relevant files

No response

System information

No response

a1ultima avatar Jun 16 '25 13:06 a1ultima