anndataR icon indicating copy to clipboard operation
anndataR copied to clipboard

Improve guessing of the mapping of dimred slots

Open rcannood opened this issue 2 months ago • 10 comments

Related to:

Description

Checklist

Before review

  • [x] Update and regenerate man pages
  • [x] Add/update tests
  • [x] Add/update examples in vignettes
  • [ ] Pass CI checks

Before merge

  • [x] Update NEWS
  • [ ] Bump devel version

rcannood avatar Sep 09 '25 03:09 rcannood

I'm not sure how I feel about this. I can see that it's nice to do this automatically but I don't like the idea of making up rules for specific cases. We already have a mechanism to let the user set how things are names so do we need this as well (also, I should check how those two interact)?

@LouiseDck What do you think?

lazappi avatar Sep 16 '25 05:09 lazappi

Indeed -- this code only applies to when we are guessing the mapping (i.e. when the user didn't provide a manual mapping).

There was already some hardcoded code for PCA to make sure the right fields end up in Seurat in the right place; this PR just abstracts it out a little for the most common DRs.

rcannood avatar Sep 16 '25 05:09 rcannood

I dislike that this is a special case where we guess the mapping, we do not do this for other cases (such as nearest neighbors graphs)? However, I do think it might really make users life easier as for example, it is still not easy to just plot any dimred using scanpy if it does not have the "X_" prepended.

So I'm also a bit conflicted.

Was there a specific reason why you initiated this PR @rcannood?

LouiseDck avatar Sep 16 '25 07:09 LouiseDck

Was there a specific reason why you initiated this PR @rcannood?

Good question!

Yes, because I was trying to fix a known issue related to SCE conversion ^^

→ https://github.com/scverse/anndataR/pull/333/files#diff-1589871b950e902a7f2ead4bd87e8c6f2c6979f50fd334851f521b1d180c6ca5

rcannood avatar Sep 16 '25 12:09 rcannood

I dislike that this is a special case where we guess the mapping, we do not do this for other cases (such as nearest neighbors graphs)? However, I do think it might really make users life easier as for example, it is still not easy to just plot any dimred using scanpy if it does not have the "X_" prepended.

Can't you just do scanpy.pl.embedding(adata, basis="whatever"). I don't think it checks for "X_*".

Yes, because I was trying to fix a known issue related to SCE conversion

Isn't this related to the dimnames though, not the key?

lazappi avatar Sep 17 '25 06:09 lazappi

I dislike that this is a special case where we guess the mapping, we do not do this for other cases (such as nearest neighbors graphs)? However, I do think it might really make users life easier as for example, it is still not easy to just plot any dimred using scanpy if it does not have the "X_" prepended.

Can't you just do scanpy.pl.embedding(adata, basis="whatever"). I don't think it checks for "X_*".

Huh indeed, thanks! It's the basis argument in scanpy.pl.scatter that prepends the X_.

LouiseDck avatar Sep 17 '25 12:09 LouiseDck

Huh indeed, thanks! It's the basis argument in scanpy.pl.scatter that prepends the X_.

You're right. That doesn't seem ideal so I've opened an issue https://github.com/scverse/scanpy/issues/3803. I think that's more a scanpy problem though so I don't think it should affect how we name things.

lazappi avatar Sep 18 '25 05:09 lazappi

Huh indeed, thanks! It's the basis argument in scanpy.pl.scatter that prepends the X_.

You're right. That doesn't seem ideal so I've opened an issue scverse/scanpy#3803. I think that's more a scanpy problem though so I don't think it should affect how we name things.

Thanks, I probably should've done this a while ago 😅

This discussion hinges on how we want to approach conversion, and how much semantics we want to guess, right?

One the one hand I do think it makes sense: reducing the barriers for people to work with different data structures is in general a good idea, and this allows people to be less familiar with certain intricacies and conventions of different data structures. But, people might be surprised to see that their dimreds changed name? On the other hand, is it desirable that people are unaware of how their data is stored? Also, with more of this "magic" conversion, it gets increasingly difficult for the user to keep a mental model of what happens during the conversion?

I think I might be slightly in favor of doing the automatic conversion? But I can see upsides and downsides for sure.

LouiseDck avatar Sep 18 '25 09:09 LouiseDck

I can also see both sides but I guess I lean the other way. My feeling is that once you start doing this you have to make a lot of decisions about how things get mapped which I would prefer to avoid. You also have to make sure that everything is obvious to the user, it can be overridden and it works in both directions.

Probably the only thing that would convince me is if a core package required specific names so that users always have to set the mapping.

lazappi avatar Sep 19 '25 06:09 lazappi

To be clear, this PR only affects the code that tries to "guess" how AnnData's should be mapped to SCE/Seurat and back. If the user manually chooses how DRs should be mapped to obsm, the code in this PR doesn't trigger.

Maybe it's somewhat related to the feedback we got related to the Bioconductor submission (#342) -- the most important part is probably that the user knows how certain things are mapped to each other and how to override it

rcannood avatar Sep 22 '25 10:09 rcannood