scvi-tools icon indicating copy to clipboard operation
scvi-tools copied to clipboard

Fix custom dataloader registry

Open canergen opened this issue 1 year ago • 2 comments

CustomDataloaders currently don't support advanced capabilities like scArches or celltype prediction in scANVI. We have to create a registry without setup_anndata that contains the same elements (see below). https://github.com/chanzuckerberg/cellxgene-census/blob/222efddf2ce82f93f76329aa353962c1dc2400ac/api/python/notebooks/experimental/pytorch_loader_scvi.ipynb is the first working example. Currently, they use the following code to save the model:

user_attributes = model._get_user_attributes()
user_attributes = {a[0]: a[1] for a in user_attributes if a[0][-1] == "_"}

user_attributes.update(
    {
        "n_batch": datamodule.n_batch,
        "n_extra_categorical_covs": 0,
        "n_extra_continuous_covs": 0,
        "n_labels": 1,
        "n_vars": datamodule.n_vars,
    }
)

We want to create a new function that fills out the registry and passes it to the model at: model = scvi.model.SCVI(n_layers=n_layers, n_latent=n_latent, gene_likelihood="nb", encode_covariates=False). You can see all necessary entries and the structure at: scvi.adata_manager.get_state_registry(scvi.REGISTRY_KEYS.X_KEY).to_dict(). After fixing this, all uses of _module_init_on_train throughout the codebase should be removed as they are not necessary anymore.

canergen avatar Jul 23 '24 17:07 canergen

Is there some documentation on what is expected of the custom dataloader's collate function? I can imagine a dict with keys like X, batch and labels just by following up on the different types of exceptions I am getting. But for poor souls like us who are not familiar with the codebase, it'd be amazing to have some documentation of what type of keys a collate function should return in the dictionary to work.

gokceneraslan avatar Aug 26 '24 17:08 gokceneraslan

Hi, we are currently still exchanging ideas with lamin and CZI to make the implementation better (and hopefully work towards support throughout all models - currently scVI works). Overall, the final requirement will be that a registry as a dictionary is created similar to https://colab.research.google.com/drive/10sXec_TicMKtLA6hMcgfkado-FgoNKxw#scrollTo=e8vZgceklGdH. We use as a discussion channel https://github.com/laminlabs/lamindb/issues/1826 to work together on a better implementation. Happy to connect offline (best case scverse Zulip) to see how we can support your work.

canergen avatar Aug 26 '24 18:08 canergen

https://github.com/scverse/scvi-tools/pull/2932

ori-kron-wis avatar Apr 10 '25 08:04 ori-kron-wis