scgen
scgen copied to clipboard
How to use data with categories unseen by a trained model
Hello, I am facing a problem with loading and passing data through a trained model when the data contains categories that were unseen by the model at training time. Specifically, I am training on certain tissues and want to use the model's prediction on other tissues. The data for these unseen tissues are stored in a separate file from the training data.
The code to do this would look like:
# Training
column_tissue = 'celltype'
train_adata = scanpy.read('train_file.h5ad')
scgen.SCGEN.setup_anndata(train_adata, batch_key=None, labels_key=column_tissue)
model = scgen.SCGEN(train_adata, **model_kwargs)
model.train(...)
# Testing
test_adata = scanpy.read('test_file.h5ad')
model.get_decoded_expression(adata=test_adata, indices=...)
This outputs:
INFO Received view of anndata, making copy.
INFO Input AnnData not setup with scvi-tools. attempting to transfer AnnData setup
And ends on:
ValueError: Category XXXX not found in source registry. Cannot transfer setup without
extend_categories = True
.
Where XXXX is a tissue that was absent from the training file.
What would be the correct way to do this? I cannot find any way to pass the extend_categories kwarg.
What I tried
After digging into the source code I imagine this would involve something like:
model.register_manager(model.adata_manager.transfer_fields(adata_target=test_adata, extend_categories=True))
But I cannot find how to make the model use this new manager.
For now, a workaround is to set the categories in the test data to a category that was present in the training data.For example, setting the tissue column in the test data to the first tissue in the registry of the model:
test_adata.obs = test_adata.obs.rename(columns={column_tissue: 'test_celltype'})
test_adata.obs[column_tissue] = model.adata_manager.registry['field_registries']['labels']['state_registry']['categorical_mapping'][0]
However this is quite an unsatisfactory solution and there is certainly a cleaner way of doing this.
Thank you!