mudata icon indicating copy to clipboard operation
mudata copied to clipboard

Unexpected (?) output from MuData.copy()

Open emdann opened this issue 3 years ago • 1 comments

Hi there, not sure whether this is really a bug, but if I make certain changes to a MuData.obs (e.g. removing duplicate columns), the obs in the copy becomes different from the original.

Example

adata = sc.datasets.pbmc3k_processed()
adata_highQ = adata[adata.obs['n_counts'] > 2000].copy()
mdata = mudata.MuData({'full':adata,'highQ':adata_highQ}, axis=0)

## Change obs
mdata.obs = mdata['full'].obs.copy()
mdata.obs.columns
Index(['n_genes', 'percent_mito', 'n_counts', 'louvain'], dtype='object')
mdata_copy = mdata.copy()
mdata_copy.obs.columns
Index(['full:n_genes', 'full:percent_mito', 'full:n_counts', 'full:louvain',
       'highQ:n_genes', 'highQ:percent_mito', 'highQ:n_counts',
       'highQ:louvain', 'n_genes', 'percent_mito', 'n_counts', 'louvain'],
      dtype='object'

I understand this comes from the copy method re-initializing the MuData object, but it leads to breaking code where an exact copy is expected.

System

  • Python v3.10
  • MuData v0.2.1

emdann avatar Dec 13 '22 09:12 emdann

Hey @emdann,

This stems from the necessity of .update() — and the fact that by default, the columns are copied from individual modalities. We might change this behaviour in v0.3 so that the columns are not copied automatically.

Currently what's expected is that the columns should be the same after running .copy() after .update().

gtca avatar Jun 01 '23 15:06 gtca

This should be fixed by the new API in v0.3 (.update(pull=False)), which will become the default one in the next versions.

gtca avatar Jul 02 '24 01:07 gtca