make scarches work with proteins, not only genes, in totalvi/anvi
Is your feature request related to a problem? Please describe. A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
Describe the solution you'd like A clear and concise description of what you want to happen.
I'd second this. This seems like a critical feature, alongside just getting scArches to work with higher versions of python so it can be used alongside rapids-single cell and other modern packages. And TotalVI has been generating issues when I use proteins. It's related to OpenPM and LLVM packages being loaded at the same time. ThreadPooler crashes and the model refuses to initialize.
I just switched to MultiVI.
@ori-kron-wis any update on this?
I was able to use a MuData object with rna and protein to train TotalVI and then use scvi.model.TOTALVI.load_query_data() for scarches and trained the query model.
Here are some relevant packages versions in my environment:
# Name Version Build Channel
python 3.11.13 h9e4cc4f_0_cpython conda-forge
pytorch 2.4.0 py3.11_cuda12.4_cudnn9.1.0_0 pytorch
scanpy 1.11.4 pyhd8ed1ab_0 conda-forge
scvi-tools 1.3.2 pyhd8ed1ab_0 conda-forge
muon 0.1.6 pyhd8ed1ab_0 conda-forge
mudata 0.3.1 pyhd8ed1ab_1 conda-forge
@racng that's great! can to share your code? did you have any additions to contribute? However, are you sure scarches worked for both modalities as it works for genes only right now? perhaps your query was with the exact same protein list so it didnt affect anything? Please elaborate.
Hi Ori, scArches also doesn’t work if you add new genes. It just that the prepare function removes those. Adding new proteins is not surgery anymore (small adaptation) but requires actually updating the network (which we don’t allow - not sure we should though). So we need to fix the prepare query function to remove additional proteins?
Yes, I always need to make the set of genes and proteins the same as the original reference model, fill in zeros for missing genes/proteins, and make the protein matrix dense.
Yes, in that case that's exactly what we want to be able to solve automatically using the prepare query function (for each modality in the mudata). In Totalvi you need to measure all your ref markers, redundant markers will be removed. MultiVI is more acceptable of other markers, so it works.