atacseq
atacseq copied to clipboard
CONTROL description needs clarifications
Description of feature
The "CONTROL" option description in the "samplesheet input" section needs to be improved.
- What exactly the control option does? From the description, it appears "CONTROL" samples may be used as input, but "input" is different from control. This is especially confusing in the provided examples, where "TREATMENT" samples are designated as "CONTROL".
- "Example sheets without controls and with controls" - links are broken.
- It appears the "control" and "control_replicate" columns are only recognized using "--with_control true" parameter, which is not clear from the documentation.
- The pipeline breaks if creating samplesheet following the example. I created the spreadsheet as:
sample | fastq_1 | fastq_2 | replicate | control | control_replicate |
---|---|---|---|---|---|
HSATACtr | 00_raw/HSATACtr1_S67_R1_001.fastq.gz | 00_raw/HSATACtr1_S67_R2_001.fastq.gz | 1 | CONTROL | 1 |
HSATACtr | 00_raw/HSATACtr2_S68_R1_001.fastq.gz | 00_raw/HSATACtr2_S68_R2_001.fastq.gz | 2 | CONTROL | 2 |
HSATACun | 00_raw/HSATACun1_S63_R1_001.fastq.gz | 00_raw/HSATACun1_S63_R2_001.fastq.gz | 1 | ||
HSATACun | 00_raw/HSATACun2_S64_R1_001.fastq.gz | 00_raw/HSATACun2_S64_R2_001.fastq.gz | 2 |
The pipeline errors with "ERROR: Please check samplesheet -> Control identifier and replicate has to match a provided sample identifier and replicate!"
Correcting the spreadsheet in the "control" column as
sample | fastq_1 | fastq_2 | replicate | control | control_replicate |
---|---|---|---|---|---|
HSATACtr | 00_raw/HSATACtr1_S67_R1_001.fastq.gz | 00_raw/HSATACtr1_S67_R2_001.fastq.gz | 1 | HSATACtr | 1 |
HSATACtr | 00_raw/HSATACtr2_S68_R1_001.fastq.gz | 00_raw/HSATACtr2_S68_R2_001.fastq.gz | 2 | HSATACtr | 2 |
HSATACun | 00_raw/HSATACun1_S63_R1_001.fastq.gz | 00_raw/HSATACun1_S63_R2_001.fastq.gz | 1 | ||
HSATACun | 00_raw/HSATACun2_S64_R1_001.fastq.gz | 00_raw/HSATACun2_S64_R2_001.fastq.gz | 2 |
works. But this contradicts the documentation.
As of now, it feels safer to run the pipeline without "controls" because it is unclear what are the consequences.