Will there ever be a convergence of NetCDF.jl and NCDatasets?
In my limited understanding is seems like NetCDF.jl has a more robust backend with DiskArrays while NCDatasets has a more friendly CommonDataModel.jl syntax. It would be fantastic if the two projects could join forces before.
Maybe NetCDF.jl is intended to be more full featured?
There is this previous thread on NCDatasets.jl: https://github.com/Alexander-Barth/NCDatasets.jl/issues/57
Regarding DiskArrays support for NCDatasets, that has been a long time coming but https://github.com/Alexander-Barth/NCDatasets.jl/issues/79 seems to be actively worked on in https://github.com/Alexander-Barth/NCDatasets.jl/pull/205.
Maybe NetCDF.jl is intended to be more full featured?
What makes you think so? NCDatasets.jl is being more actively developed, and has built in support for CF conventions.
I'm not sure how feasible it is, since it will be a bunch of work, but it would be nice if NCDatasets.jl could depend on NetCDF.jl for using the netCDF C API, and then add CommonDataModel.jl, CF conventions and other useful things on top. Then NetCDF.jl would be domain agnostic and could perhaps live in JuliaIO, with NCDatasets possibly moving to JuliaGeo.
But I'm just sketching out a possible path, and not committing to the work haha. Likely they will continue to live side by side for a while.
cc @Alexander-Barth
@visr thanks for the links, looks like this is outstanding issue in search of a resolution.
@alex-s-gardner I'll add a bit from my perspective. NCDatasets.jl implements a bunch of things not available in NetCDF.jl, like bounds variable handling, datetime, cf standards etc. Thats why Rasters.jl uses it instead of NetCDF currently.
But DiskArrayTools.jl has a CF standards implementation that can wrap any DiskArray that I think should become the standard.
CommonDataModel.jl is a nice idea but not based on DiskArrays.jl and its turned out to be a struggle to get it to be.
One major stumbling block, for years now with NCDatasets.jl, is that @Alexander-Barth needs setindex! to be able to grow an array along a dimension. This breaks the architecture of DiskArrays.jl and Rasters.jl in a fairly fundamental way - we cant grow chunk sizes or dimension lookups in a setindex! call because these are immutable objects. It also breaks pretty strongly with the Base julia AbstractArray interface in a way we cant dispatch on to work around.
My idea to use a grow! method instead (where you explicitly need to use the returned object for a DiskArray or Raster) as an API has been rejected from NCDatasets.jl, so we are pretty much at an impasse.
@tcarion has been putting a lit of work into GRIB/netcdf working with DiskArrays.jl: https://github.com/JuliaGeo/CommonDataModel.jl/pull/9 and https://github.com/Alexander-Barth/NCDatasets.jl/pull/205 and https://github.com/rafaqz/Rasters.jl/pull/416 for Rasters.jl
But we just hits the setindex! problem and have to implement CF disk array in Rasters.jl anyway because CommonDataModel wont provide that.
In the end what we need the most is both DiskArrays.jl and CF standards for grib and netcdf files, and nothing currently provides that except DiskArrayTools and the Rasters.jl PR. But neither of these are really long term solutions.
To me, adding comprehensive DiskArrays.jl support to CommonDataModel.jl (e.g. also for views and everything else) and taking care of GRIBDatasets.jl at the same time is the obvious solution to the feature mismatch. But its not clear that that will happen.
CommonDataModel.jl needs to commit to using DiskArrays.jl - or not - for us to really proceed further with this part of the ecosystem.
If it does, NCDatasets.jl will have all the functionality NetCDF.jl has. If it doesn't, then probably we need to add DiskArrayTools.jl dependency here for CFDiskArray, and fill out the functionality missing from NCDatasets.jl. I would swap out the backend in Rasters.jl to NetCDF.jl.
@rafaqz I totally agree with your sentiment. It would be great if we could get some clarity so that we can chart a path forward.
it seems to me, that this discussion was also touched on in #197 and the conclusion is, that NCDatasets.jl is the go to package for most users for accessing netcdf files and NetCDF.jl will still be around mainly as a playground for @meggart to test certain access patterns.
NCDatasets.jl is now the go to package for general use