zarr-python icon indicating copy to clipboard operation
zarr-python copied to clipboard

fix: check for non-int 0 fill values

Open ilan-gold opened this issue 5 months ago • 4 comments

A bit of an "oops" from me, but hopefully this is more robust (than #3082)! It worked for my one example but testing this without using a spy object seems impossible (happy to contribute that though, or some other test).

TODO:

  • [ ] Add unit tests and/or doctests in docstrings
  • [ ] Add docstrings and API docs for any new/modified user-facing classes and functions
  • [ ] New/modified features documented in docs/user-guide/*.rst
  • [ ] Changes documented as a new file in changes/
  • [ ] GitHub Actions have all passed
  • [ ] Test coverage is 100% (Codecov passes)

ilan-gold avatar Aug 05 '25 17:08 ilan-gold

Codecov Report

:x: Patch coverage is 50.00000% with 1 line in your changes missing coverage. Please review. :white_check_mark: Project coverage is 61.85%. Comparing base (14b372c) to head (8ba90ce). :warning: Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
src/zarr/core/buffer/cpu.py 50.00% 1 Missing :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3345      +/-   ##
==========================================
- Coverage   61.86%   61.85%   -0.01%     
==========================================
  Files          85       85              
  Lines       10111    10112       +1     
==========================================
  Hits         6255     6255              
- Misses       3856     3857       +1     
Files with missing lines Coverage Δ
src/zarr/core/buffer/cpu.py 37.50% <50.00%> (-0.69%) :arrow_down:
:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codecov[bot] avatar Aug 06 '25 04:08 codecov[bot]

to test this, could we perhaps patch np.full and np.zeros to raise exceptions, that you would then catch in the test? I'm thinking there would be 2 test functions, 1 testing the various inputs that should hit the np.full branch, and another testing the various inputs that should hit the np.zeros branch. This could also work for a single test function that internally checks which numpy routine the input scalar should trigger

d-v-b avatar Aug 06 '25 09:08 d-v-b

I've done some digging into this topic, I'll post in a bit but I would like to hold off on merging for a bit.

ilan-gold avatar Aug 12 '25 22:08 ilan-gold

Ok apologies, I know that Tom's PR supercedes this in many ways but for anyone curious, my worry was about how memory is allocated.

I learned that zeros is only faster than full because it doesn't allocate memory when requested, lazily doing so later when needed (see https://stackoverflow.com/questions/70055063/how-is-memory-handled-once-touched-for-the-first-time-in-numpy-zeros/70056188#70056188). In benchmarks, I see the times of subsequent operations on the matrix, whether something like sum that should touch all elements or __setitem__ to be identical independent of initialization so I definitely think this should be merged!

import numpy as np
buf = np.arange(100_000, dtype=np.float64)
# cell 1
%%timeit full = np.full((1_000_000,), 0, dtype=np.float64)

full[900_000:] = buf
# cell 2
%%timeit zeros = np.zeros((1_000_000,), dtype=np.float64)
    
zeros[900_000:] = buf
# cell 3
%%timeit full = np.full((1_000_000,), 0, dtype=np.float64)

full.sum()
# cell 4
%%timeit zeros = np.zeros((1_000_000,), dtype=np.float64)

zeros.sum()

Screenshot 2025-08-25 at 16 46 16

ilan-gold avatar Aug 25 '25 14:08 ilan-gold