sharding codec use of lru caching fails with numpy void scalars

Open d-v-b opened this issue 7 months ago • 1 comments

numpy void scalars are not hashable, which means that a class with a np.void attribute cannot be hashed by lru_cache, which is used by the sharding codec class here.

I ran into this In #2874. Any ideas for fixing this would be appreciated! IMO the ideal fix would be to resolve the basic performance problems we are solving with this caching layer.

May 13 '25 11:05 d-v-b

It seems like _get_chunk_spec (the problem function) is internal to the module in question, and is only called a handful of times. Might a practical solution (short-term) be to add another function that parses the inputs to _get_chunk_spec prior to calling it? This parser could then check for a void scalar, convert it to something hashable (e.g. (dtype signature, raw bytes), and then recover the original value from the hashable object prior to returning from _get_chunk_spec.

I agree that the ideal fix (likely requires larger refactor, longer-term) would to resolve the basic performance problems that are being solved with this caching layer.

May 19 '25 17:05 nenb