NCDatasets.jl icon indicating copy to clipboard operation
NCDatasets.jl copied to clipboard

Use aggdim along a dimension that doesn't exist in data

Open lsterzinger opened this issue 4 years ago • 2 comments

The model I use outputs 3d variables at each output time, in HDF5 format. However, since HDF5 does not support named dimensions, the netCDF engine used to open it adds dimensions in the form "phony_dim_[0,1,2...]". The files also do not have a"time" variable, as the timestamp is recorded as the file name.

I'd like to use NCDatasets.jl to open the files as a multi-file dataset (similar to xarray's mfdataset in Python). Using xarray, I was able to specify concat_dim="time" when opening an mfdataset. While that dimension/variable did not exist in the data, xarray would create it and then concat each data file along that axis.

Is this possible in NCDatasets? I've tried poking around a bit but can't seem to find anything that works. The docs specify that the dimension in theaggdim argument needs to exist or the files will be assumed constant.

Is the functionality I'm describing possible, or implementable? I'm relatively new to Julia but I would be willing to help implement it if it's not possible in the current version.

To Reproduce I uploaded some small datasets uploaded here (on x/y/z = 16/16/256 grid). Using NCDatasets: ds = Dataset(["control-A-2019-07-02-000000-g1.h5", "control-A-2019-07-02-000500-g1.h5", "control-A-2019-07-02-001000-g1.h5"]; aggdim="time")

Expected behavior

Expected dataset with added dimension along new time axis

Environment

  • operating system: Fedora 33/Ubuntu 18.04.4 LTS
  • Julia version: 1.5.3
  • NCDatasets version: 0.11.3

Thank you!

lsterzinger avatar Feb 26 '21 18:02 lsterzinger

As I understand your issue, this is currently not possible. This is what I get with ncdump -h:

netcdf control-A-2019-07-02-000500-g1 {
dimensions:
	phony_dim_0 = 16 ;
	phony_dim_1 = 16 ;
	phony_dim_2 = 256 ;
	phony_dim_3 = 2 ;
	phony_dim_4 = 1 ;
	phony_dim_5 = 5 ;
variables:
	float ACCPA(phony_dim_0, phony_dim_1) ;
	float ACCPD(phony_dim_0, phony_dim_1) ;
	float ACCPG(phony_dim_0, phony_dim_1) ;
	float ACCPH(phony_dim_0, phony_dim_1) ;
	float ACCPP(phony_dim_0, phony_dim_1) ;
	float ACCPR(phony_dim_0, phony_dim_1) ;
	float ACCPS(phony_dim_0, phony_dim_1) ;
	float AGGREGATET(phony_dim_2, phony_dim_0, phony_dim_1) ;
[...]

It seems that phony_dim_2, phony_dim_0, phony_dim_1 are the x, y and z coordinates and you would need to have a 4th coordinate with time. If the HDF5 files could actually be 4 dimensional (e.g. with a phony time dimension) then you would be able to concatenate over this one.

If somebody with a interest in this feature would make a PR, I would happy to review it.

Alexander-Barth avatar Mar 01 '21 08:03 Alexander-Barth

Rather than directly creating an aggdim from filenames, probably a good first step would be to support creating them from variables with scalars. So if you have one timestep per file, and a variable "time" with just 2006-12-31, which is not a dimension, it could use that too.

Edit: came up in https://discourse.julialang.org/t/read-multi-nc-files-along-new-dimension/70696

visr avatar Oct 31 '21 21:10 visr

If have implemented this feature in the master version. There will be a new release soon:

https://github.com/Alexander-Barth/NCDatasets.jl/blob/master/src/multifile.jl#L79

This is what I get with your sample data:

julia> ds = NCDataset(["control-A-2019-07-02-000000-g1.h5", "control-A-2019-07-02-000500-g1.h5", "control-A-2019-07-02-001000-g1.h5"]; aggdim="time",isnewdim=true);

julia> ds["ACCPA"]
ACCPA (16 × 16 × 3)
  Datatype:    Float32
  Dimensions:  phony_dim_1 × phony_dim_0 × time

Alexander-Barth avatar Sep 07 '22 09:09 Alexander-Barth

I am closing this issue. Feel free to re-open should the issue persists. Thank you for your time to make this issue easily reproducible!

Alexander-Barth avatar Sep 28 '22 09:09 Alexander-Barth

Sorry I didn't get a chance to test things out when you pushed the update - this works great! Super useful to me and I'm guessing many others who are used to how this works in Python/xarray but want to move to Julia.

lsterzinger avatar Sep 28 '22 11:09 lsterzinger