netcdf-c
netcdf-c copied to clipboard
Add thread-safe execution support to netcdf-c
This PR is experimental and should be used with caution.
It implements thread-safety with a mechanism similar to that used by HDF5. There is a single global mutex object that controls access to the netcdf API. It effectively serializes all calls to the API. For *nix* operating systems, it uses pthreads and more specifically the pthread_mutex functionality. For Windows it uses the built-in CriticalSection API.
The controlling option is "--enable-threadsafe" in Automake and "-DENABLE_THREADSAFE=on" in CMake. In this PR, it is enabled by default, but if it ever moves into production, it will be disabled by default.
Github Actions
Local build and test of this PR is known to work on Ubuntu-21 and Visual Studio. Build and test on Github Actions is currently failing for all systems. Fixing will require interactive access to the relevant platform.
Testing
This PR has had minimal testing; I am hoping that others can help out. There is one test case included: nc_test/run_threads.sh, which uses nc_test/tst_threads.c.
The primary functionality is in libdispatch/dmutex.c. It has some debug flags enabled to match lock with unlock to detect paths that bypass locking.
Warnings
- This implementation depends heavily on using recursive mutex objects.
Documentation
For more detailed documentation see netcdf-c/docs/threadsafe.md
Addendum (5/31/2023)
Add some more improvements.
- Make --enable-threadsafe default to disabled.
- Make threadsafe work with automake on MinGW and Cygwin.
- Add threadsafe testing to Ubuntu, MinGW, and Cygwin github actions.
- Cleanup the library search for PTHREADS4W in CMakeLists.txt.
- Add a test in nc_test -- run_thread_nc_test.sh and thread_nc_test.c -- that modifies the nc_test.c to use multi-threading.
- Add a new installed header: netcdf_threadsafe.h.
- Rename some internal header and .c files.
- Re-enable DAP4.
- Remove "const" attribute from the dispatch table pointers because it fails under multi-threading.
- Unify definitions of e.g. nulldup.
- Fix some potential memory leaks.
This is great. I have been hearing users who want this feature for years.
I wonder why you have to use the lock when doing read-only operations from the metadata. For example, when doing an inq_dim, why is there a need for a lock?
This is great. We did something similar to Exodus several years ago; also based on HDF5.
One thing I did to eliminate the need for so may goto label
statements is to create two macros that did the unlock and returned a value or did the unlock and just returned:
#define EX_FUNC_ENTER() \
do { \
/* Initialize the thread-safe code */ \
... \
/* Grab the mutex for the library */ \
...
} while (0)
/* Unlock and return a value */
#define EX_FUNC_LEAVE(error) \
do { \
ex__mutex_unlock(&EX_g, __func__, __LINE__); \
return error; \
} while (0)
/* Unlock and return from void function */
#define EX_FUNC_VOID() \
do { \
ex__mutex_unlock(&EX_g, __func__, __LINE__); \
return; \
} while (0)
/* Just unlock with no return */
#define EX_FUNC_UNLOCK() \
do { \
ex__mutex_unlock(&EX_g, __func__, __LINE__); \
} while (0)
I could then add this to the library by just searching for returns and replacing with the appropriate macro. Not better or worse; just different.
Again, great capability.
Good idea about the macros.
This effort was brute force and locks almost every netcdf API (with a few exceptions). The reason you have to (sometimes) lock metadata inquiries is that some other thread my simultaneously be modifying that metadata. Note that even if some thread is only creating new metadata it still modifies internal data structures that are accessed by simple inquiries. At some point in the future (as noted in the documentation) it may be possible to use much finer grain locking of specific shared data structures. But that is substantially more complex that using a single global lock.
Hi Dennis, lets talk about this at the next meeting; it will be a bit before we can get it merged, and I want to make sure I have all the salient details in terms of how and when this can and should be used; I'm currently still working on the 4.9.1 maintenance release (and accompanying documentation refinements). I don't want this to go stale (and I know it's still marked experimental), but I really want to make sure we are able to keep it up to date with the other changes leading up to 4.9.1, and also to make sure we have all the proper messaging and documentation in place :). Good work!
This will not be mergable for some time. I still have to figure out why it fails under github actions.
Added various fixes to the threadsafe.dmh branch:
- Fix CMakeLists.txt to ensure that -lpthread is added to the loaded libraries.
- Get github actions X ubuntu to work. OSX and MinGW still fail.
- Update documentation.
- Add more debug support in dmutex.c.