CEMBA_wmb_snATAC icon indicating copy to clipboard operation
CEMBA_wmb_snATAC copied to clipboard

Whole mouse brain snATAC seq analysis

  • CEMBA_wmb_snATAC This repository is used for the whole mouse brain (wmb) snATAC-seq data analysis of Center for Epigenomics of the Mouse Brain Atlas (CEMBA), which is now accepted by Nature 2023.

[[./repo_figures/GraphAbstract.jpg]]

  • Change Log
  1. 2024-09-25: add explanation in Discusssion for 4D and 4E dissections.
  2. 2024-10-19: add google drive link for 2.3 million cell meta data.
  • Important Note All the analysis and the h5ad data generated are from SnapATAC2 under <= 2.4.0 There are some [[https://kzhang.org/SnapATAC2/changelog.html][break changes]] later after SnapATAC2 >= 2.5.0

** Data

  • Our dissection 4D annotated in the meta data, should be 4E; and 4E should be 4D.
    • This might be a label issue during experiment record. We are not that sure.
    • Check the disccusion for details info: https://github.com/beyondpie/CEMBA_wmb_snATAC/discussions/20 .
    • In our repository, we keep everything now unchanged. So if you need 4D region, you should give 4E a look and vice versa.
  • Demultiplexed data can be accessed via the NEMO archive (NEMO, RRID:SCR_016152) at https://assets.nemoarchive.org/dat-bej4ymm (the raw directory under Source Data URL in this archive).
  • We also uploaded our demultiplexed fastq files and processed files under the GEO accession number GSE246791:
    • https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE246791
  • Processed data is also available on our web portal and can be explored here: http://www.catlas.org.
    • A Google drive link for bigwig files and SnapATAC2 files just for backup:
      • https://drive.google.com/drive/folders/1mule-CkxN939mxhYAqpOIxjPpq0j8qVP?usp=sharing
  • All the cellmeta information:
    • Google drive link: https://drive.google.com/drive/folders/1UNeOar5UuBNrIjB_z53s25nrm81arXPW?usp=share_link

** Pipeline - We now have 234 samples and 2.3 million cells in total. So most of the analysis are depend on Snakefile to organize the pipeline and submit them to high-performance cluster (HPC) in order to use hundreds of CPUs at the same time. - R, Shell and Python (>= 3.10) are mainly used, especially R (>= 4.2). - Under the directory [[./package][package]], we put lots of common functions there. - We mainly use [[https://github.com/kaizhang/SnapATAC2][SnapATAC2]] to analyze the single-nucleus ATAC-seq data - Comparation between Scrublet and AMULET: https://github.com/yuelaiwang/CEMBA_AMULET_Scrublet - The deep learning related codes now in the repo: https://github.com/yal054/mba_dl_model - sa2 is short for SnapATAC2 in this repo.

[[./repo_figures/snATAC-seq_analysis_pipeline.jpg]]
** Codes *** Clustering In total, we have implemented four-round iterative clustering. See details in [[file:01.clustering][01.clustering]] *** Integration and annotation We use Allen's scRNAseq data and their annotations for our data annotation. See details in [[file:02.integration][02.integration]] *** Peak calling We use macs2 with multiple stage filtering, especially use SPM >= 5 for filtering peaks. See details in [[file:03.peakcalling][03.peakcalling]] *** Comments on some scripts:

  1. [[file:package/R/cembav2env.R][cembav2env.R]]: R env to store the metadata during analysis. |----------------+-------------------------------------------------------| | Enviorment | Description | |----------------+-------------------------------------------------------| | cembav2env | meta data of SnapATAC and SnapATAC2 | |----------------+-------------------------------------------------------| | cluSumBySa2 | clustering meta data, such as resolution, | | | barcode to L4 Ids, L4 major regions and so on | |----------------+-------------------------------------------------------| | Sa2Integration | Integration meta data, like Allen's data descriptions | |----------------+-------------------------------------------------------| | Sa2PeakCalling | Peak calling meta data | |----------------+-------------------------------------------------------|