NCDatasets.jl icon indicating copy to clipboard operation
NCDatasets.jl copied to clipboard

Is it possible to support parallel NetCDF I/O?

Open ali-ramadhan opened this issue 4 years ago • 14 comments

I don't know much about the subject but from looking at the PnetCDF description (https://parallel-netcdf.github.io/) it sounds like there are two backend options for parallel I/O: PnetCDF and parallel HDF5?

It sounds like it might be possible to build NetCDF with parallel I/O support.

Out of curiousity, is parallel I/O something that NCDatasets.jl can feasibly support?

X-Ref: https://github.com/CliMA/Oceananigans.jl/pull/590 X-Ref: https://github.com/CliMA/ClimateMachine.jl/issues/2007

ali-ramadhan avatar Feb 07 '21 14:02 ali-ramadhan

Support for parallel NetCDF I/O would indeed be nice. Do you have any complete known-working C code example? I tried the Ubuntu 20.04 package libnetcdf-mpi-dev with nc4_pnc_put.c

But this fails with:

sudo apt-get install libnetcdf-mpi-dev
wget http://cucis.ece.northwestern.edu/projects/PnetCDF/Examples/nc4_pnc_put.c
gcc -o nc4_pnc_put -I/usr/lib/x86_64-linux-gnu/netcdf/mpi/include/ nc4_pnc_put.c -L/usr/lib/x86_64-linux-gnu/netcdf/mpi/ -lnetcdf_mpi -I/usr/lib/x86_64-linux-gnu/openmpi/include/openmpi -I/usr/lib/x86_64-linux-gnu/openmpi/include -pthread -L/usr/lib/x86_64-linux-gnu/openmpi/lib -lmpi
mpiexec -n 2 ./nc4_pnc_put testfile.nc
# Error at line=100: NetCDF: Parallel operation on file opened for non-parallel access Aborting ...

It could be that I did something stupid. I assume that the two package have the same API. Maybe this is not the case.

If don't know much on this subject either, but I somebody could provide a full C example (as opposed to code fragments) that would help a lot.

Alexander-Barth avatar Feb 08 '21 16:02 Alexander-Barth

Thanks for looking into this. I thought I was going to start playing around with parallel I/O sooner but still working on basic MPI infrastructure...

I can try getting nc4_pnc_put.c to run once I start looking at parallel NetCDF I/O.

ali-ramadhan avatar Mar 06 '21 13:03 ali-ramadhan

Any luck with this @ali-ramadhan? netcdf could also be a sink for parallel DiskArrays/Dagger.jl processing, so this would be widely useful.

rafaqz avatar Jun 18 '21 13:06 rafaqz

call for this feature too

kongdd avatar Dec 08 '22 02:12 kongdd

:eyes:

johnomotani avatar Mar 08 '24 16:03 johnomotani

I'm just chiming in to let interested people know that I've been working on this task during the last week or so. I managed to produce a working example on my laptop over the weekend. My plan is to consolidate the code changes first and then write and execute a few meaningful tests on real HPC platforms over GPFS and (hopefully) Lustre parallel file systems. If everything goes well I will update you to discuss how to proceed, opening a PR or whatever.

Just a note about parallel netcdf3 support. As of now NetCDF_jll only supports parallel netcdf4. Support for parallel netcdf3 is provided through the parallel-netcdf library which is not enabled in NetCDF_jll and not even available on Yggdrasil. While this is not a problem for this specific development (trying to access a netcdf3 file using parallel I/O will simply throw a "not supported" error), I think it would be useful for the package to support also parallel netcdf3. I have no previous experience with JLL packages and Yggdrasil, but if someone manages to add parallel-netcdf to Yggdrasil and enable support for parallel netcdf3 in NetCDF_jll I will be happy to test that too.

pgf avatar Apr 22 '24 15:04 pgf

JLL packages are not so hard to add with the wizard (see https://github.com/JuliaPackaging/BinaryBuilder.jl)

And it is very likely that you are the best situated person to do this currently, probably the 100 to 1 favorite.

And code that relies on manually installed system binaries will likely not be widely used. The julia ecosystem has moved very strongly towards versioned dependencies managed by Pkg.jl.

So, I encourage you to give it a go :)

But feel free to ping any JLL problems here for help/feedback.

rafaqz avatar Apr 22 '24 16:04 rafaqz

And code that relies on manually installed system binaries will likely not be widely used. The julia ecosystem has moved very strongly towards versioned dependencies managed by Pkg.jl.

Just to chime in here (as a potential parallel NetCDF user!) - parallel NetCDF (or HDF5) is one case where system binaries are likely to be wanted. On an HPC cluster, we probably have to use the vendor-provided MPI to get the best performance (especially for inter-node communication), so the parallel NetCDF (and HDF5, which I mention as it'll be a dependency of parallel NetCDF for netcdf4 files, I assume) libraries will need to be linked to the system MPI, which the Julia-provided binaries will not be. At least, HPC users will want the option to do that, and I guess they are the main users of parallel NetCDF...

For comparison, see the setup for parallel HDF.jl, which provides a utility function to link to the system binaries: https://juliaio.github.io/HDF5.jl/stable/mpi/#using_parallel_HDF5

johnomotani avatar Apr 22 '24 16:04 johnomotani

Yes you're probably right generally, I was thinking of non-MPI use cases like Dagger.jl.

This is pretty nice syntax in HDF5 if we only have the system binaries:

HDF5.API.set_libraries!("/path/to/your/libhdf5.so", "/path/to/your/libhdf5_hl.so")

(but it would be very nice to have a JLL and "not even available on Yggdrasil" probably means there is no-one else to do it)

rafaqz avatar Apr 22 '24 18:04 rafaqz

There is some initial work in https://github.com/Alexander-Barth/NCDatasets.jl/commit/70ef6830497674461c33114940ae58d439c7f827 in the branch MPI. For now I prioritize the current HDF5/NetCDF4 format and the Yggdrasil JLLs.

Using a custom netCDF library, potentially linking to an optimized MPI (and HDF5) library is possible using Preferences:

https://alexander-barth.github.io/NCDatasets.jl/stable/issues/#Using-a-custom-NetCDF-library

Alexander-Barth avatar Apr 23 '24 15:04 Alexander-Barth

Windows currently fails with (full logs), Linux and OS X do work ok:

NetCDF: Parallel operation on file opened for non-parallel access (NetCDF error code: -114 = NC_ENOPAR)
   nc_create_par(path::String, cmode::UInt16, mpi_comm::MPI.Comm, mpi_info::MPI.Info)
   NCDataset(comm::MPI.Comm, filename::String, mode::String; info::MPI.Info, format::Symbol, share::Bool, diskless::Bool, persist::Bool, maskingvalue::Missing, attrib::Vector{Any})
  in expression starting at D:\a\NCDatasets.jl\NCDatasets.jl\test\test_mpi_script.jl:18
  in expression starting at D:\a\NCDatasets.jl\NCDatasets.jl\test\test_mpi_netcdf.jl:10

Parallel support seem to be missing from the Windows NetCDF_jll https://github.com/JuliaBinaryWrappers/NetCDF_jll.jl/releases/download/NetCDF-v400.902.211%2B0/NetCDF-logs.v400.902.211.x86_64-w64-mingw32-mpi+microsoftmpi.tar.gz

checking whether parallel io is enabled in hdf5... no
checking for library containing H5Dread_chunk... none required
checking for library containing H5Pset_fapl_ros3... none required
checking whether HDF5 allows parallel filters... yes
checking whether szlib was used when building HDF5... yes
checking whether HDF5 library is version 1.10.6 or later... yes
configure: WARNING: Parallel io disabled for netcdf-4 because hdf5 does not support
checking whether parallel I/O is enabled for netcdf-4... no
# Features
--------
Benchmarks:		no
NetCDF-2 API:		yes
HDF4 Support:		no
HDF5 Support:		yes
NetCDF-4 API:		yes
CDF5 Support:		yes
NC-4 Parallel Support:	no
PnetCDF Support:	no

It seems that upsteam netcdf-c is not testing MPI on Windows (MSYS2 , mingw) https://github.com/Unidata/netcdf-c/actions/runs/8745640065/job/24001010636)

# Features
--------
Benchmarks:		no
NetCDF-2 API:		yes
HDF4 Support:		no
HDF5 Support:		yes
NetCDF-4 API:		yes
CDF5 Support:		yes
NC-4 Parallel Support:	no
PnetCDF Support:	no

Alexander-Barth avatar Apr 24 '24 07:04 Alexander-Barth

It is not clear if HDF5_jll has actually MPI enabled on Windows:

https://github.com/JuliaBinaryWrappers/HDF5_jll.jl/releases/download/HDF5-v1.14.3%2B3/HDF5-logs.v1.14.3.x86_64-w64-mingw32-libgfortran3-cxx03-mpi+microsoftmpi.tar.gz

Features:
---------
                     Parallel HDF5: no
  Parallel Filtered Dataset Writes: no
                Large Parallel I/O: no

If somebody with an interest in Windows can have a look at this, this would be awesome :-).

Alexander-Barth avatar Apr 24 '24 08:04 Alexander-Barth

There is some initial work in 70ef683 in the branch MPI.

Is it new? I don't remember seeing it when I checked last week. Anyway, it's almost identical to my version, except for few details. For example I called the access method paraccess to be more explicit, but that's fine.

A couple of notes:

  • the access dataset method works only with netcdf3 files, while it throws an error with netcdf4 files because nc_var_par_access doesn't recognize NC_GLOBAL as a valid variable ID. AFAIK when nc_var_par_access is called on a variable in a netcdf3 file it sets the access mode globally for the file (see here). I think there's no need for a dataset method, the variable method already does the same with netcdf3 files. Indeed, I too added the dataset method at first, but later I changed my mind.

  • I think that the MPI communicator can be an optional argument for most of the cases, so the user just need to ask for parallel access. I defined the NCDataset method this way:

    function NCDataset(filename::AbstractString,
                    mode::AbstractString = "r";
                    format::Symbol = :netcdf4,
                    parallel::Bool = false,
                    comm::MPI.Comm = MPI.COMM_WORLD,
                    info::MPI.Info = MPI.INFO_NULL,
                    ...
    

    I think this gives a cleaner call (similar to the python netcdf4 package):

    ds = NCDataset(path,"c",parallel=true)
    

    leaving comm and info for more specific use cases.

pgf avatar Apr 24 '24 17:04 pgf

Yes, this is new. I started to work on this only a couple of days ago. I did not know that you also worked on this.

Thank you for your close look at these changes. Yes, I think paraccess is better (I was indeed looking for a better name as access is probably to generic).

I considered also to make the communicator (or parallel) a keyword argument. But as far as I know, this would mean that MPI becomes a (hard) dependency of NCDatasets as we cannot dispatch on keyword arguments. I would think that netCDF with MPI makes a very good use case of a weak dependencies.

Currently MPI is the only way to have parallel access to netCDF files. For me MPI does not work so nice (or at all :-)) for interactive sessions. But maybe in future there will be other ways to do parallel access (threads, julia workers?) which all could be extensions onto which we could dispatch.

In mpi4py all MPI functions are methods of the communicator. So having the MPI communicator as the first argument of NCDatasets as main argument for dispatch seems not too surprising for me.

Alexander-Barth avatar Apr 25 '24 08:04 Alexander-Barth