NetCDF.jl icon indicating copy to clipboard operation
NetCDF.jl copied to clipboard

Will there ever be a convergence of NetCDF.jl and NCDatasets?

Open alex-s-gardner opened this issue 2 years ago • 6 comments

In my limited understanding is seems like NetCDF.jl has a more robust backend with DiskArrays while NCDatasets has a more friendly CommonDataModel.jl syntax. It would be fantastic if the two projects could join forces before.

alex-s-gardner avatar Jul 14 '23 19:07 alex-s-gardner

Maybe NetCDF.jl is intended to be more full featured?

alex-s-gardner avatar Jul 14 '23 19:07 alex-s-gardner

There is this previous thread on NCDatasets.jl: https://github.com/Alexander-Barth/NCDatasets.jl/issues/57

Regarding DiskArrays support for NCDatasets, that has been a long time coming but https://github.com/Alexander-Barth/NCDatasets.jl/issues/79 seems to be actively worked on in https://github.com/Alexander-Barth/NCDatasets.jl/pull/205.

Maybe NetCDF.jl is intended to be more full featured?

What makes you think so? NCDatasets.jl is being more actively developed, and has built in support for CF conventions.

I'm not sure how feasible it is, since it will be a bunch of work, but it would be nice if NCDatasets.jl could depend on NetCDF.jl for using the netCDF C API, and then add CommonDataModel.jl, CF conventions and other useful things on top. Then NetCDF.jl would be domain agnostic and could perhaps live in JuliaIO, with NCDatasets possibly moving to JuliaGeo.

But I'm just sketching out a possible path, and not committing to the work haha. Likely they will continue to live side by side for a while.

visr avatar Jul 14 '23 20:07 visr

cc @Alexander-Barth

visr avatar Jul 14 '23 20:07 visr

@visr thanks for the links, looks like this is outstanding issue in search of a resolution.

alex-s-gardner avatar Jul 14 '23 21:07 alex-s-gardner

@alex-s-gardner I'll add a bit from my perspective. NCDatasets.jl implements a bunch of things not available in NetCDF.jl, like bounds variable handling, datetime, cf standards etc. Thats why Rasters.jl uses it instead of NetCDF currently.

But DiskArrayTools.jl has a CF standards implementation that can wrap any DiskArray that I think should become the standard.

CommonDataModel.jl is a nice idea but not based on DiskArrays.jl and its turned out to be a struggle to get it to be.

One major stumbling block, for years now with NCDatasets.jl, is that @Alexander-Barth needs setindex! to be able to grow an array along a dimension. This breaks the architecture of DiskArrays.jl and Rasters.jl in a fairly fundamental way - we cant grow chunk sizes or dimension lookups in a setindex! call because these are immutable objects. It also breaks pretty strongly with the Base julia AbstractArray interface in a way we cant dispatch on to work around.

My idea to use a grow! method instead (where you explicitly need to use the returned object for a DiskArray or Raster) as an API has been rejected from NCDatasets.jl, so we are pretty much at an impasse.

@tcarion has been putting a lit of work into GRIB/netcdf working with DiskArrays.jl: https://github.com/JuliaGeo/CommonDataModel.jl/pull/9 and https://github.com/Alexander-Barth/NCDatasets.jl/pull/205 and https://github.com/rafaqz/Rasters.jl/pull/416 for Rasters.jl

But we just hits the setindex! problem and have to implement CF disk array in Rasters.jl anyway because CommonDataModel wont provide that.

In the end what we need the most is both DiskArrays.jl and CF standards for grib and netcdf files, and nothing currently provides that except DiskArrayTools and the Rasters.jl PR. But neither of these are really long term solutions.

To me, adding comprehensive DiskArrays.jl support to CommonDataModel.jl (e.g. also for views and everything else) and taking care of GRIBDatasets.jl at the same time is the obvious solution to the feature mismatch. But its not clear that that will happen.

CommonDataModel.jl needs to commit to using DiskArrays.jl - or not - for us to really proceed further with this part of the ecosystem.

If it does, NCDatasets.jl will have all the functionality NetCDF.jl has. If it doesn't, then probably we need to add DiskArrayTools.jl dependency here for CFDiskArray, and fill out the functionality missing from NCDatasets.jl. I would swap out the backend in Rasters.jl to NetCDF.jl.

rafaqz avatar Jul 30 '23 13:07 rafaqz

@rafaqz I totally agree with your sentiment. It would be great if we could get some clarity so that we can chart a path forward.

alex-s-gardner avatar Aug 08 '23 02:08 alex-s-gardner

it seems to me, that this discussion was also touched on in #197 and the conclusion is, that NCDatasets.jl is the go to package for most users for accessing netcdf files and NetCDF.jl will still be around mainly as a playground for @meggart to test certain access patterns.

felixcremer avatar Jul 04 '25 10:07 felixcremer

NCDatasets.jl is now the go to package for general use

alex-s-gardner avatar Jul 04 '25 18:07 alex-s-gardner