fix: check for non-int 0 fill values
A bit of an "oops" from me, but hopefully this is more robust (than #3082)! It worked for my one example but testing this without using a spy object seems impossible (happy to contribute that though, or some other test).
TODO:
- [ ] Add unit tests and/or doctests in docstrings
- [ ] Add docstrings and API docs for any new/modified user-facing classes and functions
- [ ] New/modified features documented in
docs/user-guide/*.rst - [ ] Changes documented as a new file in
changes/ - [ ] GitHub Actions have all passed
- [ ] Test coverage is 100% (Codecov passes)
Codecov Report
:x: Patch coverage is 50.00000% with 1 line in your changes missing coverage. Please review.
:white_check_mark: Project coverage is 61.85%. Comparing base (14b372c) to head (8ba90ce).
:warning: Report is 1 commits behind head on main.
| Files with missing lines | Patch % | Lines |
|---|---|---|
| src/zarr/core/buffer/cpu.py | 50.00% | 1 Missing :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## main #3345 +/- ##
==========================================
- Coverage 61.86% 61.85% -0.01%
==========================================
Files 85 85
Lines 10111 10112 +1
==========================================
Hits 6255 6255
- Misses 3856 3857 +1
| Files with missing lines | Coverage Δ | |
|---|---|---|
| src/zarr/core/buffer/cpu.py | 37.50% <50.00%> (-0.69%) |
:arrow_down: |
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
to test this, could we perhaps patch np.full and np.zeros to raise exceptions, that you would then catch in the test? I'm thinking there would be 2 test functions, 1 testing the various inputs that should hit the np.full branch, and another testing the various inputs that should hit the np.zeros branch. This could also work for a single test function that internally checks which numpy routine the input scalar should trigger
I've done some digging into this topic, I'll post in a bit but I would like to hold off on merging for a bit.
Ok apologies, I know that Tom's PR supercedes this in many ways but for anyone curious, my worry was about how memory is allocated.
I learned that zeros is only faster than full because it doesn't allocate memory when requested, lazily doing so later when needed (see https://stackoverflow.com/questions/70055063/how-is-memory-handled-once-touched-for-the-first-time-in-numpy-zeros/70056188#70056188). In benchmarks, I see the times of subsequent operations on the matrix, whether something like sum that should touch all elements or __setitem__ to be identical independent of initialization so I definitely think this should be merged!
import numpy as np
buf = np.arange(100_000, dtype=np.float64)
# cell 1
%%timeit full = np.full((1_000_000,), 0, dtype=np.float64)
full[900_000:] = buf
# cell 2
%%timeit zeros = np.zeros((1_000_000,), dtype=np.float64)
zeros[900_000:] = buf
# cell 3
%%timeit full = np.full((1_000_000,), 0, dtype=np.float64)
full.sum()
# cell 4
%%timeit zeros = np.zeros((1_000_000,), dtype=np.float64)
zeros.sum()