scvi-tools
scvi-tools copied to clipboard
Multimodal reference mapping with MuData objects
In the current (1.0.4) version of scvi-tools, multimodal models (e.g. totalVI) can be set up and trained with either AnnData or MuData objects. Thus, it would be nice to be able to perform reference atlas mapping using either workflow (i.e., using an AnnData-based totalVI or MuData-based totalVI model). However, it appears that the ArchesMixin API was designed only to handle AnnData objects. Indeed, when attempting to naively pass MuData objects into the ArchesMixin functions to perform reference mapping with totalVI I quickly ran into errors (notebook reproducing my results available here: https://colab.research.google.com/drive/1lL1JJ3bdG6UU3XVjA0tLvp1V2QLWihuI?usp=sharing).
@martinkim0 I'd be happy to take this on since I don't think it should be too much work to handle the MuData case- would you prefer to keep the current two function user-facing API (i.e., load_query_data and prepare_query_anndata) and handle the MuData case internally or explicitly have separate functions for MuData objects (e.g. prepare_query_mudata)?
Thanks for bringing up this issue! I think it makes sense based on how we've named our ArchesMixin API to implement prepare_query_mudata, which I anticipate will have much different code from prepare_query_anndata, and add MuData support to the existing load_query_data, since this method mostly deals with freezing network layers.
Let me know what you think, and we can go forward from there.
Makes sense to me! I've added this to my to-do list, and should hopefully have an initial implementation ready for review within the next week or two.
Hey @martinkim0! Apologies for the (large) delay here. I just opened an initial PR addressing this (#2578) though I think my current way of accomplishing this is too hacky and would love to hear your thoughts on a more elegant way forward.