scgen icon indicating copy to clipboard operation
scgen copied to clipboard

How to use data with categories unseen by a trained model

Open majpark21 opened this issue 1 year ago • 1 comments

Hello, I am facing a problem with loading and passing data through a trained model when the data contains categories that were unseen by the model at training time. Specifically, I am training on certain tissues and want to use the model's prediction on other tissues. The data for these unseen tissues are stored in a separate file from the training data.

The code to do this would look like:

# Training
column_tissue = 'celltype'
train_adata = scanpy.read('train_file.h5ad')
scgen.SCGEN.setup_anndata(train_adata, batch_key=None, labels_key=column_tissue)
model = scgen.SCGEN(train_adata, **model_kwargs)
model.train(...)
# Testing
test_adata = scanpy.read('test_file.h5ad')
model.get_decoded_expression(adata=test_adata, indices=...)

This outputs:

INFO Received view of anndata, making copy.
INFO Input AnnData not setup with scvi-tools. attempting to transfer AnnData setup

And ends on:

ValueError: Category XXXX not found in source registry. Cannot transfer setup without extend_categories = True.

Where XXXX is a tissue that was absent from the training file.

What would be the correct way to do this? I cannot find any way to pass the extend_categories kwarg.

What I tried

After digging into the source code I imagine this would involve something like: model.register_manager(model.adata_manager.transfer_fields(adata_target=test_adata, extend_categories=True)) But I cannot find how to make the model use this new manager.

For now, a workaround is to set the categories in the test data to a category that was present in the training data.For example, setting the tissue column in the test data to the first tissue in the registry of the model:

test_adata.obs = test_adata.obs.rename(columns={column_tissue: 'test_celltype'})
test_adata.obs[column_tissue] = model.adata_manager.registry['field_registries']['labels']['state_registry']['categorical_mapping'][0]

However this is quite an unsatisfactory solution and there is certainly a cleaner way of doing this.

Thank you!

majpark21 avatar Oct 07 '22 10:10 majpark21