anndata
anndata copied to clipboard
Fix view behavior for AwkwardArrays
- [x] Closes #1035
- [x] Follow up of #1040
- [ ] Release note added (or unnecessary)
I think this does the trick
Codecov Report
Merging #1070 (fdeff90) into main (2c8759d) will decrease coverage by
2.20%. The diff coverage is100.00%.
Additional details and impacted files
@@ Coverage Diff @@
## main #1070 +/- ##
==========================================
- Coverage 84.75% 82.56% -2.20%
==========================================
Files 36 36
Lines 5149 5132 -17
==========================================
- Hits 4364 4237 -127
- Misses 785 895 +110
| Flag | Coverage Δ | |
|---|---|---|
| gpu-tests | ? |
Flags with carried forward coverage won't be shown. Click here to find out more.
| Files Changed | Coverage Δ | |
|---|---|---|
| anndata/_io/specs/methods.py | 87.40% <ø> (-0.31%) |
:arrow_down: |
| anndata/_core/aligned_mapping.py | 93.99% <100.00%> (+0.32%) |
:arrow_up: |
| anndata/_core/views.py | 82.58% <100.00%> (-7.18%) |
:arrow_down: |
Shouldn’t this have some tests?
This is already covered by existing tests in test_views and test_awkward. The current version is not broken, but will break at some point in the future.
@grst I get the sense that it might be helpful to have a meeting to discuss this. Would you be able to find time for a zoom at some point?
Agreed, I sent an email with a poll to find a timeslot.
@ivirshup, @ilan-gold, @agoose77, ready for re-review. Feels good to get rid of all that code :boom:
Currently the tests fail for uns, but that seems to be a more general problem, see #1118.
@grst it is possible to modify the original array if you don't perform any row-based slices, i.e array["x"] = x, then this would modify array.
I can't recall to what extent we touched on this on the call. If the process of pulling out the Awkward Array always returns a shallow copy, this wouldn't be a problem.
Hi @agoose77,
returning the shallow copy (or slice) works as expected. The remaining problem is that there's no handler implemented for the uns slot while subsetting AnnData -- regardless of whether it's an awkward array or a numpy array. I don't think we want to always return a shallow copy on __getitem__ -- I think there are legitimate use-cases where one wants to modify records.
The remaining problem is that there's no handler implemented for the uns slot while subsetting AnnData
Is that blocking for this PR, or tangential? I'd like to make a release candidate by the end of the week, so would like to merge soon if possible.
If you don't mind, I can mark the respective tests as xfail. I think it's ok as
- the behavior is not a regression of this PR, it was like that before
- the problem is tracked in #1118 and
- I don't think it affects many users
I can mark the respective tests as xfail
Which tests are these?
@ivirshup, these: https://github.com/scverse/anndata/pull/1070/files#diff-1534693aa7219116bb36568d007d5f2116414d0c86148cfa15d859b861abafc7R143-R166
So just to clarify what this PR does. For this block:
import anndata as ad, awkward as ak, numpy as np
a = ad.AnnData(
np.ones((3, 3)),
obsm={"awk": ak.Array([{"a": 1}, {"a": 2}, {"a": 3}])}
)
v = a[:2]
v.obsm["awk"]["b"] = [5, 6]
display(v)
display(v.obsm["awk"])
On main:
<ipython-input-10-0f89ad3994c2>:8: ImplicitModificationWarning: Trying to modify attribute `.obsm` of view, initializing view as actual.
v.obsm["awk"]["b"] = [5, 6]
AnnData object with n_obs × n_vars = 2 × 3
obsm: 'awk'
<Array [{a: 1, b: 5}, {a: 2, b: 6}] type='2 * {a: int64, b: int64}'>
On this PR:
View of AnnData object with n_obs × n_vars = 2 × 3
obsm: 'awk'
<Array [{a: 1}, {a: 2}] type='2 * {a: int64}'>
This is the desired behavior?
Is uns failed for other view tests? I don't know that I really see the need to specially include, then xfail these cases.
Thanks for catching this! I thought my tests cover this, but it turns out I retrieved the awkward array from the view only once and did all tests on that copy. Instead, retreiving v.obsm["awk"] multiple times always returns a fresh copy of the sliced awkward array which makes it impossible to modify it.
To solve this, it is necessary to perform a shallow copy of the awkward array on view creation (which is a cheap operation). I added this behavior here: https://github.com/scverse/anndata/pull/1070/files#diff-af8b8ed3c983f1ed77b81030eb9d80df8136e0640324f180354e6e0d3b49e7e7R139-R159
Maybe this could also be useful for pandas data frames with copy-on-write behavior in the future?
I think caching the "view"s could be a good idea. I may need to think for a bit on how this effects my current thoughts on how to fix the garbage collection problem:
- https://github.com/scverse/anndata/pull/1119
Currently, this won't work with the solution proposed there, since the AlignedMappings are created on the fly to prevent circular references from being formed.