chip-nf
chip-nf copied to clipboard
An automated ChIP-seq pipeline using Nextfow
= ChIP-nf :nextflow: http://www.nextflow.io/ :nextflow-quickstart: http://www.nextflow.io/docs/latest/getstarted.html#get-started :macs2-outfiles: https://github.com/taoliu/MACS#output-files :pvalue: pass:q[[red]#-log_10(P)#] :nf-shield: https://img.shields.io/badge/nextflow-%E2%89%A50.23.1-blue.svg :travis-badge: https://travis-ci.org/guigolab/chip-nf.svg?branch=master
image:{nf-shield}["Nextflow", link="https://nextflow.io", window="_blank"] image:{travis-badge}["Build Status", link="https://travis-ci.org/guigolab/chip-nf"]
A {nextflow}[Nextflow^] pipeline for processing ChIP-seq data.
== Installing Nextflow
Nextflow can be installed by using the following command:
[source,bash]
curl -fsSL get.nextflow.io | bash
== Running the pipeline
First you need to pull the pipeline using Nextflow:
[source,bash]
$ nextflow pull guigolab/chip-nf Checking guigolab/chip-nf ... downloaded from https://github.com/guigolab/chip-nf.git
You can get the pipeline help with the following command:
[source,bash]
$ nextflow run chip-nf --help
N E X T F L O W ~ version 0.24.1
Launching guigolab/chip-nf
[nostalgic_franklin] - revision: 974a45c356 [master]
C H I P - N F ~ ChIP-seq Pipeline
Run ChIP-seq analyses on a set of data.
Usage: chipseq-pipeline.nf --index TSV_FILE --genome GENOME_FILE [OPTION]...
Options: --help Show this message and exit. --index TSV_FILE Tab separted file containing information about the data. --genome GENOME_FILE Reference genome file. --genome-index GENOME_INDEX_ FILE Reference genome index file. --genome-size GENOME_SIZE Reference genome size for MACS2 callpeaks. Must be one of MACS2 precomputed sizes: hs, mm, dm, ce. (Default: hs) --mismatches MISMATCHES Sets the maximum number/percentage of mismatches allowed for a read (Default: 2). --multimaps MULTIMAPS Sets the maximum number of mappings allowed for a read (Default: 10). --min-matched-bases BASES Sets the minimum number/percentage of bases that have to match with the reference (Default: 0.80). --quality-threshold THRESHOLD Sets the sequence quality threshold for a base to be considered as low-quality (Default: 26). --fragment-length LENGTH Sets the fragment length globally for all samples (Default: 200). --remove-duplicates Remove duplicate alignments instead of just flagging them (Default: false). --rescale Rescale peak scores to conform to the format supported by the UCSC genome browser (score must be <1000) (Default: false). --shift Move fragments ends and apply global extsize in peak calling. (Default: false).
== Input
The input data and metadata should be specified using a tab separated file and passing it to the pipeline command with the option --index
. Here is an example of the file format:
[source,bash]
sample1 sample1_run1 /path/to/sample1_run1.fastq.gz - H3 sample1 sample1_run2 /path/to/sample1_run2.fastq.gz - H3 sample1 sample1_run3 /path/to/sample1_run3.fastq.gz - H3 sample1 sample1_run4 /path/to/sample1_run4.fastq.gz - H3 sample2 sample2_run1 /path/to/sample2_run1.fastq.gz control1 H3K4me2 control1 control1_run1 /path/to/control1_run1.fastq.gz control1 input
The fields in the file correspond to:
- identifier used for merging the BAM files
- single run identifier
- path to the fastq file to be processed
- identifier of the input or
-
if no control is used - mark/histone or
input
if the line refers to a control - optional sample fragment length. If not specified the fragment length is estimated using SPP
== Output
The pipeline will produce the following output data:
-
Alignments
-
pileupSignal
, pileup signal tracks -
fcSignal
, fold enrichment signal tracks -
pvalueSignal
, {pvalue} signal tracks -
narrowPeak
, peak locations with peak summit, pvalue and qvalue (BED6+4
) -
broadPeak
, similar tonarrowPeak
(BED6+3
) -
gappedPeak
, both narrow and broad peaks (BED12+3
)
Check {macs2-outfiles}[MACS2 output files^] for details.
The output data information is written to a file called chipseq-pipeline.db
created in the folder from where the pipeline is run. Here is an example of the db file:
[source,bash]
sample1 /path/to/results/peakOut/sample1.pileup_signal.bw H3 255 pileupSignal 0.9960 0.4393 sample1 /path/to/results/peakOut/sample1_peaks.narrowPeak H3 255 narrowPeak 0.9960 0.4393 sample1 /path/to/results/sample1.bam H3 255 Alignments 0.9960 0.4393 sample1 /path/to/results/peakOut/sample1_peaks.gappedPeak H3 255 gappedPeak 0.9960 0.4393 sample1 /path/to/results/peakOut/sample1_peaks.broadPeak H3 255 broadPeak 0.9960 0.4393 sample2 /path/to/results/peakOut/sample2_peaks.gappedPeak H3K4me2 200 gappedPeak 0.9995 0.7216 sample2 /path/to/results/peakOut/sample2.fc_signal.bw H3K4me2 200 fcSignal 0.9995 0.7216 sample2 /path/to/results/peakOut/sample2.pval_signal.bw H3K4me2 200 pvalueSignal 0.9995 0.7216 sample2 /path/to/results/peakOut/sample2_peaks.broadPeak H3K4me2 200 broadPeak 0.9995 0.7216 sample2 /path/to/results/peakOut/sample2.pileup_signal.bw H3K4me2 200 pileupSignal 0.9995 0.7216 sample2 /path/to/results/peakOut/sample2_peaks.narrowPeak H3K4me2 200 narrowPeak 0.9995 0.7216 sample2 /path/to/results/sample2_GCCAAT_primary.bam H3K4me2 200 Alignments 0.9995 0.7216
The fields in the file correspond to:
- merge identifier
- path
- mark/histone
- (estimated) fragment length
- data type
- NRF (Nonredundant Fraction)
- FRiP (Fraction of Reads in Peaks)