xbatcher icon indicating copy to clipboard operation
xbatcher copied to clipboard

Dimension name change with `concat_input_dims` is a side effect

Open cmdupuis3 opened this issue 2 years ago • 3 comments

What is your issue?

Title. The problem is that changing dimension names makes it difficult for the user to index into batched arrays in a batch loop. This is particularly annoying because changing the value of concat_input_dims will change this behavior, sometimes appending _input, sometimes not, which makes debugging and experimentation difficult. I view this as an unwelcome side effect, and I'd prefer the non-batched dimensions keep their original names.

cmdupuis3 avatar Jan 20 '23 20:01 cmdupuis3

Hey @maxrjones, does this serve any purpose? It's incredibly annoying to get through batch generation only to crash because I forgot to rename the dimensions I'm subsetting.

cmdupuis3 avatar Jan 30 '23 20:01 cmdupuis3

Partial example:


    bgen = xb.BatchGenerator(
        ds,
        {'nlon':nlons,       'nlat':nlats},
        concat_input_dims=True
    )

    sub = {'nlon':range(halo_size,nlons-halo_size),
           'nlat':range(halo_size,nlats-halo_size)}

    for batch in bgen:
        batch_input  = [batch[x][sub] for x in ['SSH', 'SST']]

This will crash because the names of batch_input's dimensions are now nlon_input and nlat_input, but if concat_input_dims=False the dim names stay the same.

cmdupuis3 avatar Jan 31 '23 01:01 cmdupuis3

Just learned that xarray rolling adds "_input" (or something similar) also, and it's used to distinguish between the original dimensions (which may still exist) and the new stencil dims.

I'm thinking that this looks superfluous in xbatcher because (at least in my case) the original dimensions are always stacked. Maybe "_input" makes sense if they aren't stacked?

cmdupuis3 avatar Feb 13 '23 22:02 cmdupuis3