n_proteins Parameter in MultiVI Class
Hello! I am currently conducting research on single-cell multimodal data. I accessed the code for the paper "MultiVI: deep generative model for the integration of multimodal data" published on Zenodo. In the file Protein_update_3_TESTING, I found the following code:
# ######################################################################################################################
# TRAIN ALL 3 MODALITIES
adata = anndata.read("dogma_all_genes_cells_dig_ctrl_annotated.h5ad.gz")
adata = adata.copy()
scvi.data.setup_anndata(adata, protein_expression_obsm_key='protein_expression')
n_genes = (adata.var.modality == 'Gene Expression').sum()
n_regions = (adata.var.modality == 'Peaks').sum()
n_proteins = adata.obsm['protein_expression'].shape[1]
os.environ['CUDA_DEVICE_ORDER'] = 'PCI_BUS_ID'
os.environ['CUDA_VISIBLE_DEVICES'] = '1'
mvi = scvi.model.MULTIVI(adata, n_genes=n_genes, n_regions=n_regions, n_proteins=n_proteins)
testing(mvi, save_path="trained_models/Test3Mod_DIGCTRL75b_211020", pdf_path="Test3Mod_DIGCTRL75B_")
# ######################################################################################################################
In this code, I noticed that a parameter called 'n_proteins' is set when creating the MULTIVI model.
However, in version 1.1.6 of scvi-tools, when I input similar code attempting to specify the n_proteins parameter, such as:
model = scvi.model.MULTIVI(
adata_mvi,
n_genes=(adata_mvi.var["modality"] == "Gene Expression").sum(),
n_regions=(adata_mvi.var["modality"] == "Peaks").sum(),
n_proteins=0,
)
model.view_anndata_setup()
It results in an error: TypeError: MULTIVAE.init() got an unexpected keyword argument 'n_proteins'.
Upon inspecting the _multivi.py source file, I indeed found that the class does not have an n_proteins parameter:
"""Integration of multi-modal and single-modality data :cite:p:`AshuachGabitto21`.
MultiVI is used to integrate multiomic datasets with single-modality (expression
or accessibility) datasets.
Parameters
----------
adata
AnnData object that has been registered via :meth:`~scvi.model.MULTIVI.setup_anndata`.
n_genes
The number of gene expression features (genes).
n_regions
The number of accessibility features (genomic regions).
modality_weights
Weighting scheme across modalities. One of the following:
* ``"equal"``: Equal weight in each modality
* ``"universal"``: Learn weights across modalities w_m.
* ``"cell"``: Learn weights across modalities and cells. w_{m,c}
modality_penalty
Training Penalty across modalities. One of the following:
* ``"Jeffreys"``: Jeffreys penalty to align modalities
* ``"MMD"``: MMD penalty to align modalities
* ``"None"``: No penalty
n_hidden
Number of nodes per hidden layer. If `None`, defaults to square root
of number of regions.
n_latent
Dimensionality of the latent space. If `None`, defaults to square root
of `n_hidden`.
n_layers_encoder
Number of hidden layers used for encoder NNs.
n_layers_decoder
Number of hidden layers used for decoder NNs.
dropout_rate
Dropout rate for neural networks.
model_depth
Model sequencing depth / library size.
region_factors
Include region-specific factors in the model.
gene_dispersion
One of the following
* ``'gene'`` - genes_dispersion parameter of NB is constant per gene across cells
* ``'gene-batch'`` - genes_dispersion can differ between different batches
* ``'gene-label'`` - genes_dispersion can differ between different labels
protein_dispersion
One of the following
* ``'protein'`` - protein_dispersion parameter is constant per protein across cells
* ``'protein-batch'`` - protein_dispersion can differ between different batches NOT TESTED
* ``'protein-label'`` - protein_dispersion can differ between different labels NOT TESTED
latent_distribution
One of
* ``'normal'`` - Normal distribution
* ``'ln'`` - Logistic normal distribution (Normal(0, I) transformed by softmax)
deeply_inject_covariates
Whether to deeply inject covariates into all layers of the decoder. If False,
covariates will only be included in the input layer.
fully_paired
allows the simplification of the model if the data is fully paired. Currently ignored.
**model_kwargs
Keyword args for :class:`~scvi.module.MULTIVAE`
...
What’s going on here? Was this parameter removed in a new version of scvi-tools, or is this a BUG? Looking forward to your reply! Thx!
Versions:
VERSION 1.16
Yes, we changed the multiVI code after the initial release. The correct version of multiVI for reproducibility should be defined there. @marianogabitto can you otherwise suggest the correct versions.
Yes, we changed the multiVI code after the initial release. The correct version of multiVI for reproducibility should be defined there. @marianogabitto can you otherwise suggest the correct versions.
Thx. And does this mean that current version of MultiVI is now specifically designed to handle multiome datasets and no longer supports protein data? If that’s not the case, could you plz provide some corresponding tutorials?🧐
Hi, you have to add protein_expression_obsm_key to setup_anndata when setting up the model to use the protein data. MultiVI can handle RNA+protein+ATAC and any combination of these. There is no tutorial beyond RNA+ATAC.