insct
insct copied to clipboard
scRNAseq integration with triplet neural networks
insct ("Insight")
INtegration of millions of Single Cells using batch-aware Triplet networks
INSCT
is a deep learning algorithm which calculates an integrated embedding for scRNA-seq data. With INSCT
, you can:
- Integrate scRNA-seq datasets across batches with/without labels.
- Generate a low-dimensional representation of the scRNA-seq data.
- Integrate of millions of cells on personal computers.
For more info check out our manuscript.
How does it work?
-
INSCT
learns a data representation, which integrates cells across batches. The goal of the network is to minimize the distance between Anchor and Positive while maximizing the distance between Anchor and Negative. Anchor and Positive pairs consist of transcriptionally similar cells from different batches. The Negative is a transcriptomically dissimilar cell sampled from the same batch as the Anchor. - Principal components of three data points corresponding to Anchor, Positive and Negative are fed into three identical neural networks, which share weights. The triplet loss function is used to train the network weights and the two-dimensional embedding layer activations represent the integrated embedding.
To learn an integrated embedding that overcomes batch effects, INSCT
samples triplets in a batch-aware manner:
What does it do?
For example, we simulated scRNAseq data, where batch effects dominate the embedding:
However, INSCT
learns an integrated embedding where cells cluster by group instead of batch:
Check out our interactive tutorials!
The following notebooks can be run within your web browser and allow you to interactively explore tnn. We have prepared the following analysis examples:
Notebooks to reproduce the analyses described in our preprint can be found in the reproducibility folder.
Installation
INSCT
depends on the following Python packages. These need to be installed separately:
ivis==1.7.2
scanpy
hnswlib
To install INSCT
, follow these instructions:
Github
Install directly from Github using pip:
pip install git+https://github.com/lkmklsmn/insct.git
Download the package from Github and install it locally:
git clone http://github.com/lkmklsmn/insct
cd insct
pip install .
Usage
Unsupervised model
Triplets sampled based on transcriptional similarity
- AnnData object with PCs
- Batch vector
from insct.tnn import TNN
model = TNN()
model.fit(X = adata, batch_name='batch')
Supervised model
Triplets sampled based on both transcriptional similarity and known labels
- AnnData object with PCs
- Batch vector
- Celltype vector
model = TNN()
model.fit(X = adata, batch_name='batch', celltype_name='Celltypes')
Semi-supervised model
Triplets sampled based on both transcriptional similarity and known labels
- AnnData object with PCs
- Batch vector
- Celltype vector
- Masking vector (which labels to ignore)
model = TNN()
model.fit(X = adata, batch_name='batch', celltype_name='Celltypes', mask_batch= batch_name)
Output
- Coordinates for the integrated embedding