scvi-tools icon indicating copy to clipboard operation
scvi-tools copied to clipboard

MultiVI for trimodal integration

Open tt2190 opened this issue 1 year ago • 8 comments

Thanks so much for developing the scVI tools.

I have CITE-seq, 10x multiome and unimodal scATAC data from the corresponding tissue of multiple donors and hope to put them together and perform an integrated analysis of the trimodal (RNA, protein, ATAC) data.

I was considering Multigate from Theis Lab, but I have just realised that MultiVI has been trying to support protein information (#1712) and _multivi.py has been updated to accept protein input.

I would be grateful if you could enlighten me on the current recommendation regarding the use of MultiVI for this trimodal integration scenario.

tt2190 avatar Apr 04 '23 21:04 tt2190

Hello tt (not sure your name), is there anything that you would like to know? MultiVI works for protein information.

Cheers, Mariano

marianogabitto avatar Apr 04 '23 22:04 marianogabitto

Hi Mariano,

Thank you very much for your reply. My apologies for not being clear in my original post.

I wanted to ask if MultiVI works for trimodal integration scenarios, as the MultiVI paper and documentation do not mention its applicability for protein data or for integration of three modalities.

Have you released any tutorial for trimodal integration (e.g. CITE-seq + 10x multiome)?

Many thanks, Tomo

tt2190 avatar Apr 04 '23 22:04 tt2190

Hi Tomo, protein information has to enter as in TotalVI, in the .obsm field of the anndata. Can you give it a try and let me know?

M

marianogabitto avatar Apr 04 '23 22:04 marianogabitto

Hi Mariano,

Thank you for clarifying this point. I will let you know once I get the results!

Many thanks, Tomo

tt2190 avatar Apr 04 '23 23:04 tt2190

Hi Mariano,

I have tried integration of CITE-seq + multiome using the datasets used in the totalVI and MultiVI tutorials: CITE-seq: scvi.data.pbmcs_10x_cite_seq() multiome: https://cf.10xgenomics.com/samples/cell-arc/2.0.0/pbmc_unsorted_10k/pbmc_unsorted_10k_filtered_feature_bc_matrix.tar.gz

I have specified protein_expression_obsm_key="protein_expression" when I setup the anndata with scvi.model.MULTIVI.setup_anndata. However, it seems that the protein_foreground_probability has not been computed. Could you please give me some additional guidance on how to integrate protein information into the MultiVI workflow?

My analysis notebook is available from the following link: https://www.dropbox.com/s/n8uzw2wfzoqnlnu/trimodal_MultiVI_test.ipynb?dl=0

I have also realised that the organize_multiome_anndatas function does not work when protein information is included in obsm.

Many thanks, Tomo

tt2190 avatar Apr 14 '23 11:04 tt2190

Hi Tomo, sorry for the delay. We are running the last edits on the manuscript and I will work on the tutorial soon. I will based it on your notebook actually ! Thanks for sharing it.

    At a first pass, you did everything right. Let me work on the example myself and get back to you in 2 days.

Thanks, Mariano

marianogabitto avatar Apr 18 '23 17:04 marianogabitto

Hi Mariano,

Thank you for your reply.

Performance-wise, as shown in my notebook, the CITE-seq and multiome data were completely separated (not integrated) after model training. This may be because I don't currently have GPU available and I tested with max_epochs=150 to save runtime on CPU, but I was also concerned if I was doing something wrong other than #epochs. Therefore, it would be very helpful if I could learn from your tutorial.

Thank you for your time and help and I look forward to hearing from you soon!

Thanks, Tomo

tt2190 avatar Apr 18 '23 17:04 tt2190

Tomo, I will work on this and I should cite you as a reference in the tutorial for helping with this. We should connect outside this thread to make the tutorial happen !

marianogabitto avatar Apr 18 '23 19:04 marianogabitto