mudata icon indicating copy to clipboard operation
mudata copied to clipboard

Pull or push `.obsm` or `.varm` annotations.

Open Marius1311 opened this issue 1 year ago • 2 comments

Thank you for this amazing package!

Is your feature request related to a problem? Please describe. Currently, there does not seem to be a build-in method to pull or push obsm or varm anntoations from local tables to the global table.

Describe the solution you'd like Given that the v3 rc can do this for .obs and .var annotations, I suppose obsm or varm annotations could be moved using the same logic. This would be very convenient!

Marius1311 avatar Jul 25 '24 13:07 Marius1311

Hey @Marius1311, thanks for the great point!

Do you think there are use cases that you can share? It would be great to understand the context better to see how we could potentially implement that.

Moving multimodal annotations will have an additional complexity, e.g. due to additional dimensions and sparsity. E.g. for CITE-seq, pulling prot embedding to the multimodal level might result in a huge dense matrix, which will be mostly empty, of size (rna.n_obs + prot.n_obs) x embedding_size. This seems to be an inconvenience rather than something blocking as it should be manageable with some flags to sparsify things and/or user warnings.

One more thing that I would also be concerned about is mixing different entities in one multimodal annotation. Using the same CITE-seq assay as an example, pulling X_umap from both prot and rna will add the (rna.n_obs + prot.n_obs) x 2 matrix as a multimodal embedding however the rna and prot values in that matrix should not be compared or mixed together, and the origin of these values (i.e. the fact they originated from two separate embeddings) might get hard to track down the line.

gtca avatar Aug 01 '24 01:08 gtca

Hi @gtca, in my case, I was dealing with spatial data, where .obsm contained spatial coordinates. In my MuData object, I had one AnnData containing genes measured in space, and another one containing imputed gene expression for many more genes. The observations were the same, except for some that I think got filtered out in one assay for some technical reasons. So this was a case of axis=-1, which is probably a bit niche. In my case, I had initialized the object with spatial coordinates at the global level in .obsm, and I wanted to propagate them to the individual AnnData object for further analysis with tools that currently require an AnnData object. The use case might be somewhat niche.

Marius1311 avatar Aug 02 '24 09:08 Marius1311