anndataR
anndataR copied to clipboard
Improve guessing of the mapping of dimred slots
Related to:
Description
Checklist
Before review
- [x] Update and regenerate man pages
- [x] Add/update tests
- [x] Add/update examples in vignettes
- [ ] Pass CI checks
Before merge
- [x] Update
NEWS - [ ] Bump devel version
I'm not sure how I feel about this. I can see that it's nice to do this automatically but I don't like the idea of making up rules for specific cases. We already have a mechanism to let the user set how things are names so do we need this as well (also, I should check how those two interact)?
@LouiseDck What do you think?
Indeed -- this code only applies to when we are guessing the mapping (i.e. when the user didn't provide a manual mapping).
There was already some hardcoded code for PCA to make sure the right fields end up in Seurat in the right place; this PR just abstracts it out a little for the most common DRs.
I dislike that this is a special case where we guess the mapping, we do not do this for other cases (such as nearest neighbors graphs)? However, I do think it might really make users life easier as for example, it is still not easy to just plot any dimred using scanpy if it does not have the "X_" prepended.
So I'm also a bit conflicted.
Was there a specific reason why you initiated this PR @rcannood?
Was there a specific reason why you initiated this PR @rcannood?
Good question!
Yes, because I was trying to fix a known issue related to SCE conversion ^^
→ https://github.com/scverse/anndataR/pull/333/files#diff-1589871b950e902a7f2ead4bd87e8c6f2c6979f50fd334851f521b1d180c6ca5
I dislike that this is a special case where we guess the mapping, we do not do this for other cases (such as nearest neighbors graphs)? However, I do think it might really make users life easier as for example, it is still not easy to just plot any dimred using scanpy if it does not have the "X_" prepended.
Can't you just do scanpy.pl.embedding(adata, basis="whatever"). I don't think it checks for "X_*".
Yes, because I was trying to fix a known issue related to SCE conversion
Isn't this related to the dimnames though, not the key?
I dislike that this is a special case where we guess the mapping, we do not do this for other cases (such as nearest neighbors graphs)? However, I do think it might really make users life easier as for example, it is still not easy to just plot any dimred using scanpy if it does not have the "X_" prepended.
Can't you just do
scanpy.pl.embedding(adata, basis="whatever"). I don't think it checks for"X_*".
Huh indeed, thanks! It's the basis argument in scanpy.pl.scatter that prepends the X_.
Huh indeed, thanks! It's the
basisargument inscanpy.pl.scatterthat prepends theX_.
You're right. That doesn't seem ideal so I've opened an issue https://github.com/scverse/scanpy/issues/3803. I think that's more a scanpy problem though so I don't think it should affect how we name things.
Huh indeed, thanks! It's the
basisargument inscanpy.pl.scatterthat prepends theX_.You're right. That doesn't seem ideal so I've opened an issue scverse/scanpy#3803. I think that's more a scanpy problem though so I don't think it should affect how we name things.
Thanks, I probably should've done this a while ago 😅
This discussion hinges on how we want to approach conversion, and how much semantics we want to guess, right?
One the one hand I do think it makes sense: reducing the barriers for people to work with different data structures is in general a good idea, and this allows people to be less familiar with certain intricacies and conventions of different data structures. But, people might be surprised to see that their dimreds changed name? On the other hand, is it desirable that people are unaware of how their data is stored? Also, with more of this "magic" conversion, it gets increasingly difficult for the user to keep a mental model of what happens during the conversion?
I think I might be slightly in favor of doing the automatic conversion? But I can see upsides and downsides for sure.
I can also see both sides but I guess I lean the other way. My feeling is that once you start doing this you have to make a lot of decisions about how things get mapped which I would prefer to avoid. You also have to make sure that everything is obvious to the user, it can be overridden and it works in both directions.
Probably the only thing that would convince me is if a core package required specific names so that users always have to set the mapping.
To be clear, this PR only affects the code that tries to "guess" how AnnData's should be mapped to SCE/Seurat and back. If the user manually chooses how DRs should be mapped to obsm, the code in this PR doesn't trigger.
Maybe it's somewhat related to the feedback we got related to the Bioconductor submission (#342) -- the most important part is probably that the user knows how certain things are mapped to each other and how to override it