guigolab/chip-nf: An automated ChIP-seq pipeline using Nextfow

trafficstars

= ChIP-nf :nextflow: http://www.nextflow.io/ :nextflow-quickstart: http://www.nextflow.io/docs/latest/getstarted.html#get-started :macs2-outfiles: https://github.com/taoliu/MACS#output-files :pvalue: pass:q[[red]#-log_10(P)#] :nf-shield: https://img.shields.io/badge/nextflow-%E2%89%A50.23.1-blue.svg :travis-badge: https://travis-ci.org/guigolab/chip-nf.svg?branch=master

image:{nf-shield}["Nextflow", link="https://nextflow.io", window="_blank"] image:{travis-badge}["Build Status", link="https://travis-ci.org/guigolab/chip-nf"]

A {nextflow}[Nextflow^] pipeline for processing ChIP-seq data.

== Installing Nextflow

Nextflow can be installed by using the following command:

[source,bash]

curl -fsSL get.nextflow.io | bash

== Running the pipeline

First you need to pull the pipeline using Nextflow:

[source,bash]

$ nextflow pull guigolab/chip-nf Checking guigolab/chip-nf ... downloaded from https://github.com/guigolab/chip-nf.git

You can get the pipeline help with the following command:

[source,bash]

$ nextflow run chip-nf --help

N E X T F L O W ~ version 0.24.1 Launching guigolab/chip-nf [nostalgic_franklin] - revision: 974a45c356 [master]

C H I P - N F ~ ChIP-seq Pipeline

Run ChIP-seq analyses on a set of data.

Usage: chipseq-pipeline.nf --index TSV_FILE --genome GENOME_FILE [OPTION]...

Options: --help Show this message and exit. --index TSV_FILE Tab separted file containing information about the data. --genome GENOME_FILE Reference genome file. --genome-index GENOME_INDEX_ FILE Reference genome index file. --genome-size GENOME_SIZE Reference genome size for MACS2 callpeaks. Must be one of MACS2 precomputed sizes: hs, mm, dm, ce. (Default: hs) --mismatches MISMATCHES Sets the maximum number/percentage of mismatches allowed for a read (Default: 2). --multimaps MULTIMAPS Sets the maximum number of mappings allowed for a read (Default: 10). --min-matched-bases BASES Sets the minimum number/percentage of bases that have to match with the reference (Default: 0.80). --quality-threshold THRESHOLD Sets the sequence quality threshold for a base to be considered as low-quality (Default: 26). --fragment-length LENGTH Sets the fragment length globally for all samples (Default: 200). --remove-duplicates Remove duplicate alignments instead of just flagging them (Default: false). --rescale Rescale peak scores to conform to the format supported by the UCSC genome browser (score must be <1000) (Default: false). --shift Move fragments ends and apply global extsize in peak calling. (Default: false).

== Input

The input data and metadata should be specified using a tab separated file and passing it to the pipeline command with the option --index. Here is an example of the file format:

[source,bash]

sample1 sample1_run1 /path/to/sample1_run1.fastq.gz - H3 sample1 sample1_run2 /path/to/sample1_run2.fastq.gz - H3 sample1 sample1_run3 /path/to/sample1_run3.fastq.gz - H3 sample1 sample1_run4 /path/to/sample1_run4.fastq.gz - H3 sample2 sample2_run1 /path/to/sample2_run1.fastq.gz control1 H3K4me2 control1 control1_run1 /path/to/control1_run1.fastq.gz control1 input

The fields in the file correspond to:

identifier used for merging the BAM files
single run identifier
path to the fastq file to be processed
identifier of the input or - if no control is used
mark/histone or input if the line refers to a control
optional sample fragment length. If not specified the fragment length is estimated using SPP

== Output

The pipeline will produce the following output data:

Alignments
pileupSignal, pileup signal tracks
fcSignal, fold enrichment signal tracks
pvalueSignal, {pvalue} signal tracks
narrowPeak, peak locations with peak summit, pvalue and qvalue (BED6+4)
broadPeak, similar to narrowPeak (BED6+3)
gappedPeak, both narrow and broad peaks (BED12+3)

Check {macs2-outfiles}[MACS2 output files^] for details.

The output data information is written to a file called chipseq-pipeline.db created in the folder from where the pipeline is run. Here is an example of the db file:

[source,bash]

sample1 /path/to/results/peakOut/sample1.pileup_signal.bw H3 255 pileupSignal 0.9960 0.4393 sample1 /path/to/results/peakOut/sample1_peaks.narrowPeak H3 255 narrowPeak 0.9960 0.4393 sample1 /path/to/results/sample1.bam H3 255 Alignments 0.9960 0.4393 sample1 /path/to/results/peakOut/sample1_peaks.gappedPeak H3 255 gappedPeak 0.9960 0.4393 sample1 /path/to/results/peakOut/sample1_peaks.broadPeak H3 255 broadPeak 0.9960 0.4393 sample2 /path/to/results/peakOut/sample2_peaks.gappedPeak H3K4me2 200 gappedPeak 0.9995 0.7216 sample2 /path/to/results/peakOut/sample2.fc_signal.bw H3K4me2 200 fcSignal 0.9995 0.7216 sample2 /path/to/results/peakOut/sample2.pval_signal.bw H3K4me2 200 pvalueSignal 0.9995 0.7216 sample2 /path/to/results/peakOut/sample2_peaks.broadPeak H3K4me2 200 broadPeak 0.9995 0.7216 sample2 /path/to/results/peakOut/sample2.pileup_signal.bw H3K4me2 200 pileupSignal 0.9995 0.7216 sample2 /path/to/results/peakOut/sample2_peaks.narrowPeak H3K4me2 200 narrowPeak 0.9995 0.7216 sample2 /path/to/results/sample2_GCCAAT_primary.bam H3K4me2 200 Alignments 0.9995 0.7216

The fields in the file correspond to:

merge identifier
path
mark/histone
(estimated) fragment length
data type
NRF (Nonredundant Fraction)
FRiP (Fraction of Reads in Peaks)

chip-nf
chip-nf copied to clipboard

Metadata

[source,bash]

curl -fsSL get.nextflow.io | bash

[source,bash]

$ nextflow pull guigolab/chip-nf Checking guigolab/chip-nf ... downloaded from https://github.com/guigolab/chip-nf.git

[source,bash]

C H I P - N F ~ ChIP-seq Pipeline

[source,bash]

[source,bash]

← Metadata

Owner

Metadata

chip-nf chip-nf copied to clipboard

Metadata

[source,bash]

curl -fsSL get.nextflow.io | bash

[source,bash]

$ nextflow pull guigolab/chip-nf Checking guigolab/chip-nf ... downloaded from https://github.com/guigolab/chip-nf.git

[source,bash]

C H I P - N F ~ ChIP-seq Pipeline

[source,bash]

[source,bash]

← Metadata

Owner

Metadata

chip-nf
chip-nf copied to clipboard