anndata icon indicating copy to clipboard operation
anndata copied to clipboard

Fix view behavior for AwkwardArrays

Open grst opened this issue 2 years ago • 15 comments

  • [x] Closes #1035
  • [x] Follow up of #1040
  • [ ] Release note added (or unnecessary)

grst avatar Jul 24 '23 18:07 grst

I think this does the trick

grst avatar Jul 24 '23 18:07 grst

Codecov Report

Merging #1070 (fdeff90) into main (2c8759d) will decrease coverage by 2.20%. The diff coverage is 100.00%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1070      +/-   ##
==========================================
- Coverage   84.75%   82.56%   -2.20%     
==========================================
  Files          36       36              
  Lines        5149     5132      -17     
==========================================
- Hits         4364     4237     -127     
- Misses        785      895     +110     
Flag Coverage Δ
gpu-tests ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Changed Coverage Δ
anndata/_io/specs/methods.py 87.40% <ø> (-0.31%) :arrow_down:
anndata/_core/aligned_mapping.py 93.99% <100.00%> (+0.32%) :arrow_up:
anndata/_core/views.py 82.58% <100.00%> (-7.18%) :arrow_down:

... and 5 files with indirect coverage changes

:loudspeaker: Have feedback on the report? [Share it here](https://about.codecov.io/codecov-pr-comment-feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=scverse).

codecov[bot] avatar Jul 24 '23 18:07 codecov[bot]

Shouldn’t this have some tests?

This is already covered by existing tests in test_views and test_awkward. The current version is not broken, but will break at some point in the future.

grst avatar Jul 25 '23 16:07 grst

@grst I get the sense that it might be helpful to have a meeting to discuss this. Would you be able to find time for a zoom at some point?

agoose77 avatar Jul 25 '23 17:07 agoose77

Agreed, I sent an email with a poll to find a timeslot.

grst avatar Jul 27 '23 14:07 grst

@ivirshup, @ilan-gold, @agoose77, ready for re-review. Feels good to get rid of all that code :boom:

Currently the tests fail for uns, but that seems to be a more general problem, see #1118.

grst avatar Sep 01 '23 07:09 grst

@grst it is possible to modify the original array if you don't perform any row-based slices, i.e array["x"] = x, then this would modify array.

I can't recall to what extent we touched on this on the call. If the process of pulling out the Awkward Array always returns a shallow copy, this wouldn't be a problem.

agoose77 avatar Sep 01 '23 20:09 agoose77

Hi @agoose77,

returning the shallow copy (or slice) works as expected. The remaining problem is that there's no handler implemented for the uns slot while subsetting AnnData -- regardless of whether it's an awkward array or a numpy array. I don't think we want to always return a shallow copy on __getitem__ -- I think there are legitimate use-cases where one wants to modify records.

grst avatar Sep 04 '23 06:09 grst

The remaining problem is that there's no handler implemented for the uns slot while subsetting AnnData

Is that blocking for this PR, or tangential? I'd like to make a release candidate by the end of the week, so would like to merge soon if possible.

ivirshup avatar Sep 05 '23 18:09 ivirshup

If you don't mind, I can mark the respective tests as xfail. I think it's ok as

  • the behavior is not a regression of this PR, it was like that before
  • the problem is tracked in #1118 and
  • I don't think it affects many users

grst avatar Sep 06 '23 06:09 grst

I can mark the respective tests as xfail

Which tests are these?

ivirshup avatar Sep 07 '23 11:09 ivirshup

@ivirshup, these: https://github.com/scverse/anndata/pull/1070/files#diff-1534693aa7219116bb36568d007d5f2116414d0c86148cfa15d859b861abafc7R143-R166

grst avatar Sep 07 '23 11:09 grst

So just to clarify what this PR does. For this block:

import anndata as ad, awkward as ak, numpy as np

a = ad.AnnData(
    np.ones((3, 3)),
    obsm={"awk": ak.Array([{"a": 1}, {"a": 2}, {"a": 3}])}
)
v = a[:2]
v.obsm["awk"]["b"] = [5, 6]

display(v)
display(v.obsm["awk"])

On main:

<ipython-input-10-0f89ad3994c2>:8: ImplicitModificationWarning: Trying to modify attribute `.obsm` of view, initializing view as actual.
  v.obsm["awk"]["b"] = [5, 6]
AnnData object with n_obs × n_vars = 2 × 3
    obsm: 'awk'
<Array [{a: 1, b: 5}, {a: 2, b: 6}] type='2 * {a: int64, b: int64}'>

On this PR:

View of AnnData object with n_obs × n_vars = 2 × 3
    obsm: 'awk'
<Array [{a: 1}, {a: 2}] type='2 * {a: int64}'>

This is the desired behavior?


Is uns failed for other view tests? I don't know that I really see the need to specially include, then xfail these cases.

ivirshup avatar Sep 07 '23 13:09 ivirshup

Thanks for catching this! I thought my tests cover this, but it turns out I retrieved the awkward array from the view only once and did all tests on that copy. Instead, retreiving v.obsm["awk"] multiple times always returns a fresh copy of the sliced awkward array which makes it impossible to modify it.

To solve this, it is necessary to perform a shallow copy of the awkward array on view creation (which is a cheap operation). I added this behavior here: https://github.com/scverse/anndata/pull/1070/files#diff-af8b8ed3c983f1ed77b81030eb9d80df8136e0640324f180354e6e0d3b49e7e7R139-R159

Maybe this could also be useful for pandas data frames with copy-on-write behavior in the future?

grst avatar Sep 08 '23 07:09 grst

I think caching the "view"s could be a good idea. I may need to think for a bit on how this effects my current thoughts on how to fix the garbage collection problem:

  • https://github.com/scverse/anndata/pull/1119

Currently, this won't work with the solution proposed there, since the AlignedMappings are created on the fly to prevent circular references from being formed.

ivirshup avatar Sep 08 '23 16:09 ivirshup