xarray-beam icon indicating copy to clipboard operation
xarray-beam copied to clipboard

Consider omitting unchunked dimensions from Key objects created with DatasetToChunks

Open shoyer opened this issue 2 years ago • 1 comments

Currently we have (from https://xarray-beam.readthedocs.io/en/latest/read-write.html):

with beam.Pipeline() as p:
    p | xbeam.DatasetToChunks(ds, chunks={'time': 1000}) | beam.MapTuple(print_summary)
Key(offsets={'lat': 0, 'lon': 0, 'time': 0}, vars=None)
  with <xarray.Dataset data_vars=['air'] dims={'lat': 25, 'time': 1000, 'lon': 53}>
Key(offsets={'lat': 0, 'lon': 0, 'time': 1000}, vars=None)
  with <xarray.Dataset data_vars=['air'] dims={'lat': 25, 'time': 1000, 'lon': 53}>
Key(offsets={'lat': 0, 'lon': 0, 'time': 2000}, vars=None)
  with <xarray.Dataset data_vars=['air'] dims={'lat': 25, 'time': 920, 'lon': 53}>

Should we instead omit lat and lon from these keys? This is less explicit but also more flexible, e.g,. if replacing these dimensions entirely with different dimensions, you don't need to update the keys.

shoyer avatar May 06 '22 17:05 shoyer

One of my original motivations for this is obviated by #50, which now allows us to handle variables in DatasetToChunks even if they don't include "chunked" dimensions.

It's still an open question whether this change would make Xarray-Beam more usable or not.

If we do not make this change, potentially we could enforce the invariant that key.offsets.keys() == dataset.dims.keys(). This might be convenient for writing new transforms.

shoyer avatar Sep 03 '22 04:09 shoyer