Deepak Cherian

Results 1084 comments of Deepak Cherian

This is snakeviz: https://jiffyclub.github.io/snakeviz/

Yup saving 30s with #9808 . The cache is quite effective: `CacheInfo(hits=826, misses=4, maxsize=None, currsize=4)`

Belatedly realizing that Xarray's call to `normalize_chunks` is a major time waster here given that `chunks` contains a tuple with O(1 million) elements hehe.

This repository has consolidated metadata, so this ticket has always been about dask, and specifically how Xarray calls `tokenize`, and how Xarray checks for chunk alignment.

This is partly an Xarray issue with us passing a normalized chunks tuple that is very very large. tokenizing that takes ages: https://github.com/pydata/xarray/pull/9897

One thought for anyone interested, we might skip "normalizing" chunks for int `chunks` (like all Zarrs & netCDFs in existence today) and pass them straight through; dask can handle them...

I can't remember why but it's to do with handling the `"auto"` I think?

Nice, the other one that bugs me when profiling is https://github.com/zarr-developers/zarr-python/blob/b3e9aed305092236c5db70deee0b26dad648d3b0/src/zarr/codecs/zstd.py#L79

Can you use `xr.full_like`? https://docs.xarray.dev/en/stable/generated/xarray.full_like.html

I've run in to this before. The underlying variable object is `IndexVariable` which has a dummy `chunk` method https://github.com/pydata/xarray/blob/95bb9ae4233c16639682a532c14b26a3ea2728f3/xarray/core/variable.py#L2707-L2709