zarr-python icon indicating copy to clipboard operation
zarr-python copied to clipboard

sharding codec use of lru caching fails with numpy void scalars

Open d-v-b opened this issue 7 months ago • 1 comments

numpy void scalars are not hashable, which means that a class with a np.void attribute cannot be hashed by lru_cache, which is used by the sharding codec class here.

I ran into this In #2874. Any ideas for fixing this would be appreciated! IMO the ideal fix would be to resolve the basic performance problems we are solving with this caching layer.

d-v-b avatar May 13 '25 11:05 d-v-b

It seems like _get_chunk_spec (the problem function) is internal to the module in question, and is only called a handful of times. Might a practical solution (short-term) be to add another function that parses the inputs to _get_chunk_spec prior to calling it? This parser could then check for a void scalar, convert it to something hashable (e.g. (dtype signature, raw bytes), and then recover the original value from the hashable object prior to returning from _get_chunk_spec.

I agree that the ideal fix (likely requires larger refactor, longer-term) would to resolve the basic performance problems that are being solved with this caching layer.

nenb avatar May 19 '25 17:05 nenb