netcdf-fortran icon indicating copy to clipboard operation
netcdf-fortran copied to clipboard

Zarr support in NetCDF-Fortran for "cloud-native" model simulations?

Open JiaweiZhuang opened this issue 5 years ago • 17 comments

First thanks for all the great work on NetCDF!

I have a research project that will significantly benefit from NetCDF-Zarr. I recently saw a tweet from @jhamman that "pre-alpha will be available early in 2020". I also notice some Zarr-related updates like Unidata/netcdf-c#1259. I am excited to test the new Zarr capability with real models and give feedbacks. Is it possible to get a preliminary version to play with around Feb-March? Or is it still too early to say?

More details about the use case: My workflow involves running Fortran-based models in a cloud-native container environment, for example AWS Batch or Kubernetes cluster. The main benefit is to scale out ensemble runs quickly via AWS Batch Array Jobs or Kubernetes Parallel Jobs . Similar to what Pangeo does, but here for Fortran models instead of Dask workers. However I/O is a major pain in a container environment (need to deal with Persistent Volumes for example). It is actually possible to mount a Lustre to Kubernetes, but the workflow will be much, much simpler if the model can directly read/write with S3.

JiaweiZhuang avatar Jan 11 '20 16:01 JiaweiZhuang

We are hoping to have a version out in the next month or two, so the Feb-March timeframe is perfectly reasonable!

WardF avatar Jan 15 '20 20:01 WardF

Just to check -- is it possible to get a testing version this month?

JiaweiZhuang avatar Mar 06 '20 03:03 JiaweiZhuang

In fortran no. In C maybe. But we still need an S3 driver. We are currently using local storage formats for testing.

DennisHeimbigner avatar Mar 06 '20 04:03 DennisHeimbigner

I take that back. Once the C version is working, it should also work with any language that used the C library. If, that is, the language will no interfere with the use of URLs as path names for nc_open and nc_create.

DennisHeimbigner avatar Mar 06 '20 04:03 DennisHeimbigner

@DennisHeimbigner and @WardF, do you think it would be possible to write Zarr from FORTRAN using the new 4.8.0 NetCDF C library with this approach @ocefpaf pointed me toward: https://riptutorial.com/fortran/example/7149/calling-c-from-fortran

rsignell-usgs avatar Apr 08 '21 14:04 rsignell-usgs

It should be possible assuming that the nf_open path can take a URL string. I think one of our interns tested this over the summer and I believe it worked.

DennisHeimbigner avatar Apr 08 '21 18:04 DennisHeimbigner

Cool! Which intern was it? It would be nice to find out what they discovered.

rsignell-usgs avatar Apr 09 '21 10:04 rsignell-usgs

@DennisHeimbigner pingity ping ping

rsignell-usgs avatar May 04 '21 14:05 rsignell-usgs

I just built netcdf-c-4.8.0 with netcdf-fortran-4.5.3, also using MPI for parallelIO.

All tests passed.

I had to use: FCFLAGS='-fallow-argument-mismatch -g -Wall' FFLAGS='-fallow-argument-mismatch -g -Wall'

The fortran library just hands the path over to the C library, so Zarr stuff should work transparently to Fortran, just as DAP does.

edwardhartnett avatar May 04 '21 15:05 edwardhartnett

@edhartnett , you had to use "-g"? So not ready for prime time (e.g. "-O3" yet)?

What I'd like to do is write Zarr from our ocean modeling simulations that would look exactly like what xarray produces...

rsignell-usgs avatar May 05 '21 11:05 rsignell-usgs

No, the -g and -Wall are not what I meant. I had to use -fallow-argument-mismatch.

Absolutely this is ready for prime-time. ;-)

On Wed, May 5, 2021 at 5:29 AM Rich Signell @.***> wrote:

@edhartnett https://github.com/edhartnett , you had to use "-g"? So not ready for prime time (e.g. "-O3" yet)?

What I'd like to do is write Zarr from our ocean modeling simulations that would look exactly like what xarray produces...

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Unidata/netcdf-fortran/issues/209#issuecomment-832614538, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABCSXXG5RE72VBYZGU3GEK3TMETX3ANCNFSM4KFS26AQ .

edhartnett avatar May 05 '21 12:05 edhartnett

@edhartnett, do you have a sample fortran program that creates a zarr dataset you could share?

rsignell-usgs avatar May 05 '21 12:05 rsignell-usgs

No, sorry. I haven't tried Zarr.

edwardhartnett avatar May 05 '21 13:05 edwardhartnett

@edhartnett, Ah bummer. But it should now be possible for me to do that, right?
Ooh, maybe I could use "ncgen -f" to get a sample code.

rsignell-usgs avatar May 05 '21 13:05 rsignell-usgs

Take any simple Fortran program that creates a simple netcdf4 dataset. Suppose it creates a file called "simple.nc". Replace the call of nf_create("simple.nc",NF_NETCDF4,ncid) with nf_create("file://simple.zarr#mode=zarr,file",NF_NETCDF4,ncid) That should create directory called simple.zarr that is in pure zarr format. You can replace the mode=zarr,file with mode=nczarr,file if you want to create with NCZarr format.

DennisHeimbigner avatar May 05 '21 16:05 DennisHeimbigner

@DennisHeimbigner, okay, I'll try that! And mode=nczarr,xarray,file if we want to create xarray-compatible zarr, right?

rsignell-usgs avatar May 05 '21 17:05 rsignell-usgs

Depends. If you use the github master, then yes, mode=xarray,file should produce pure zarr with the xarray convention. If you use 4.8.0, then it does not xarray support. Please let me know if you have problems.

DennisHeimbigner avatar May 05 '21 17:05 DennisHeimbigner