Re-analysis of a combined ChIP-Seq & RNA-Seq data set

This is the code for a re-analysis of a GEO dataset that I originally analyzed for this paper using statistical methods that were not yet available at the time, such as the csaw Bioconductor package, which provides a principled way to normalize windowed counts of ChIP-Seq reads and test them for differential binding. The original paper only analyzed binding within pre-defined promoter regions. In addition, some improvements have also been made to the RNA-seq analysis using newer features of limma such as quality weights.

This workflow downloads the sequence data and sample metadata from the public GEO/SRA release, so anyone can download and run this code to reproduce the full analysis.

Workflow

Rule Graph

Completed components

ChIP-seq
- Mapping with bowtie2
- Peak calling with MACS2 and Epic
- Fetching of blacklists from UCSC
- Generation of greylists from ChIP-Seq input samples
- IDR analysis of blacklist-filtered peak calls
- Computation of cross-correlation function for ChIP-Seq samples, excluding blacklisted regions
- Counting in windows across the genome
RNA-seq
- Mapping with STAR & HISAT2
- Counting reads aligned to genes
- Alignment-free bias-corrected transcript quantification using Salmon & Kallisto
- Differential gene expression

Possible TODO components

Integrating RNA-seq and ChIP-seq
- hiAnnotator: http://bioconductor.org/packages/devel/bioc/html/hiAnnotator.html
- ChIPseeker: http://bioconductor.org/packages/devel/bioc/html/ChIPseeker.html
- mogsa: http://bioconductor.org/packages/release/bioc/html/mogsa.html
Gene set tests
- ToPASeq: http://bioconductor.org/packages/devel/bioc/html/ToPASeq.html
- mvGST: http://bioconductor.org/packages/devel/bioc/html/mvGST.html
- mgsa: http://bioconductor.org/packages/release/bioc/html/mgsa.html
QC Stuff
- ChIPQC: http://bioconductor.org/packages/release/bioc/html/ChIPQC.html
- MultiQC: http://multiqc.info/
- Rqc: http://www.bioconductor.org/packages/devel/bioc/html/Rqc.html
mixOmics: http://mixomics.org/
ica: https://cran.rstudio.com/web/packages/ica/index.html
Motif enrichment
pcaExplorer: https://bioconductor.org/packages/release/bioc/html/pcaExplorer.html

TODO Code cleanup

Remove unnecessary library() calls
Put spaces around equals signs

TODO Other

Document how to run the pipeline
Provide install script for R & Python packages.

Dependencies

Command-line tools

ascp Aspera download client for downloading SRA files
Bedtools
Bowtie2 aligner
Epic peak caller
fastq-tools
HISAT2 aligner
IDR python script
Kallisto RNA-seq quantifier
MACS2 peak caller
Picard tools for various file manipulation utilities
Salmon RNA-seq quantifier (devel version 0.7.3)
Shoal
Snakemake for running the workflow
SRA toolkit for extracting reads from SRA files
STAR aligner
UCSC command-line tools (e.g. liftOver)

Programming languages and packages

R, Bioconductor, and the following R packages:
- From CRAN: assertthat, doParallel, dplyr, future, getopt, GGally, ggforce, ggfortify, ggplot2, ks, lazyeval, lubridate, magrittr, MASS, Matrix, openxlsx, optparse, parallel, purrr, RColorBrewer, readr, reshape2, rex, scales, stringi, stringr
- From Bioconductor: annotate, Biobase, BiocParallel, BSgenome.Hsapiens.UCSC.hg19, BSgenome.Hsapiens.UCSC.hg38, ChIPQC, csaw, edgeR, GenomicFeatures, GenomicRanges, GEOquery, limma, org.Hs.eg.db, Rsamtools, Rsubread, rtracklayer, S4Vectors, SRAdb, SummarizedExperiment, TxDb.Hsapiens.UCSC.hg19.knownGene, tximport
- Installed manually: sleuth, wasabi
Python 3 and the following Python packages: biopython, atomicwrites, numpy, pandas, plac, pysam, rpy2, snakemake

CD4-csaw
CD4-csaw copied to clipboard

Metadata

Re-analysis of a combined ChIP-Seq & RNA-Seq data set

Workflow

Completed components

Possible TODO components

TODO Code cleanup

TODO Other

Dependencies

Command-line tools

Programming languages and packages

← Metadata

Owner

Metadata

CD4-csaw CD4-csaw copied to clipboard

Metadata

Re-analysis of a combined ChIP-Seq & RNA-Seq data set

Workflow

Completed components

Possible TODO components

TODO Code cleanup

TODO Other

Dependencies

Command-line tools

Programming languages and packages

← Metadata

Owner

Metadata

CD4-csaw
CD4-csaw copied to clipboard