cpa
cpa copied to clipboard
Predicting using trained model.
Hi, I've successfully trained a model from scratch by following the tutorial on the following link https://cpa-tools.readthedocs.io/en/latest/tutorials/combosciplex_Rdkit_embeddings.html
However, I'm currently lost on how to use the trained model in predicting an unseen dataset. I've tried creating the a new anndata with unseen perturbation but the following error occured.
INFO Input AnnData not setup with scvi-tools. attempting to transfer AnnData setup
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[48], line 1
----> 1 model.predict(ood_adata, batch_size=1024)
File [c:\Users\Ardo\.conda\envs\env.cpa\lib\site-packages\torch\autograd\grad_mode.py:27](file:///C:/Users/Ardo/.conda/envs/env.cpa/lib/site-packages/torch/autograd/grad_mode.py:27), in _DecoratorContextManager.__call__..decorate_context(*args, **kwargs)
24 @functools.wraps(func)
25 def decorate_context(*args, **kwargs):
26 with self.clone():
---> 27 return func(*args, **kwargs)
File [c:\Users\Ardo\.conda\envs\env.cpa\lib\site-packages\cpa\_model.py:679](file:///C:/Users/Ardo/.conda/envs/env.cpa/lib/site-packages/cpa/_model.py:679), in CPA.predict(self, adata, indices, batch_size, n_samples, return_mean)
676 assert self.module.recon_loss in ["gauss", "nb", "zinb"]
677 self.module.eval()
--> 679 adata = self._validate_anndata(adata)
680 if indices is None:
681 indices = np.arange(adata.n_obs)
File [c:\Users\Ardo\.conda\envs\env.cpa\lib\site-packages\scvi\model\base\_base_model.py:415](file:///C:/Users/Ardo/.conda/envs/env.cpa/lib/site-packages/scvi/model/base/_base_model.py:415), in BaseModelClass._validate_anndata(self, adata, copy_if_view)
409 if adata_manager is None:
410 logger.info(
411 "Input AnnData not setup with scvi-tools. "
412 + "attempting to transfer AnnData setup"
413 )
414 self._register_manager_for_instance(
...
230 self.attr_key,
231 categorical_dtype=cat_dtype,
232 )
ValueError: Category CHEMBL1213492+CHEMBL491473 not found in source registry. Cannot transfer setup without `extend_categories = True`.
Any help would be appreciated.
Hi, same question here. The authors seem to believe that data with known combination but different dosage are OOD data, shown in the default tutorial. This should work since dosage is encoded by an independent encoder. However, as users, we believe OOD should mean samples we do not know drug perturbation/cell type/dosage, and the authors have another tutorial to handle this case.
Just notice that they have a version with drug embeddings database, which would at least allow us to predict the contributions of drugs in this database: https://colab.research.google.com/github/theislab/cpa/blob/master/docs/tutorials/combosciplex_Rdkit_embeddings.ipynb#scrollTo=79062e65-3de9-4916-8999-449ef2df3edf
Hi, you can use these embeddings as an example or any other gene or drug embeddings to generalize to unseen embeddings
Hi, same question here. I think the definition of OOD between the authors and users might be different here. The authors seem to believe that data with known combination but different dosage are OOD data. This should work since dosage is encoded by an independent encoder. However, as users, we believe OOD should mean samples we do not know drug perturbation/cell type/dosage. Therefore, I think CPA does not have the function precisely matched our definition.
I suggest you to read the toturials we have all sorts of scenarios dosage, cell types unseen drugs and combinations and genes etc.
Thanks for your notes, just clarified my words.