cosima-cookbook
cosima-cookbook copied to clipboard
Should querying.getvar() implement automatic chunking?
xarray
v0.16.0 includes a feature .chunk(chunks='auto')
(see xarray docs)
I'm wondering whether it would be useful if automatic chunking is applied to the getvar()
's output before returned to user.
@angus-g, @aidanheerdegen
This defers to dask's auto chunking, which tries to get 128MiB chunks by default, whereas getvar()
returns chunks aligned with those on-disk to minimise data shuffling. This is something that would need profiling: is it best to work with large chunks, which may reduce the number of nodes in the task graph by some factor at the cost of possible inter-worker communication to coalesce chunks; or is it best to leave the on-disk chunks as the unit of computation?
Well these are the sort of "academic" questions I have no idea nor intuition about... ;)
Yep ... but we still need our high-level dask tutorial, right?
I know @AndyHoggANU; sorry :( Unfortunately I haven't yet reached the point at which I can teach people anything...
I did automatic chunking on some 0.1° data today (that had been loaded via getvar
and thus already had file-aligned chunks). The resulting chunking was doubled along each dimension. For the computation I was doing, it didn't seem to make a big difference, but I don't think it was the prime concern anyway.
Should we close this?