modulome-workflow
modulome-workflow copied to clipboard
Workflow to download, process, and explore microbial RNA-seq data from NCBI SRA
modulome-workflow
:warning: This repository is now deprecated. Please see https://github.com/SBRG/modulome-workflow for the actively maintained repository
This repository presents a computational workflow to compute and characterize all iModulons for a selected organism. This occurs in five steps:
- Gather all publicly available RNA-seq data for the organism (Step 1)
- Process the RNA-seq data (Step 2)
- Inspect data to identify high-quality datasets (Step 3)
- Compute iModulons (Step 4)
- Characterize iModulons using PyModulon (Step 5)
Background
iModulons are independently-modulated group of genes that are computed through Independent Component Analysis (ICA) of a gene expression dataset. To learn more about iModulons or explore published iModulons, visit iModulonDB or see our publications for Escherichia coli, Staphylococcus aureus, or Bacillus subtilis.
Here, we introduce the concept of the Modulome for an organism, which is the set of all iModulons that can be computed for the organism based on publicly available RNA-seq data. The computational pipeline provides a step-by-step workflow to compute the Modulome for Bacillus subtilis.
Setup
Docker
We have provided pre-built Docker containers with all necessary software.
To begin, install Docker and Nextflow.
Local installation
You can also run each program locally, with all requirements listed in the conda environment.yaml file. For Step 5 (Characterized iModulons), additionally install pymodulon.
Cite
Please cite the following pre-print: Mining all publicly available expression data to compute dynamic microbial transcriptional regulatory networks