gnucobol icon indicating copy to clipboard operation
gnucobol copied to clipboard

Add a COB_LOAD_GLOBAL config option to modify `dlopen` behavior

Open florianschmidt1994 opened this issue 10 months ago • 5 comments
trafficstars

This PR adds a config option called COB_LOAD_GLOBAL which allows the flags of dlopen used in call.c to be changed between RTLD_GLOBAL and RTLD_LOCAL (see https://linux.die.net/man/3/dlopen)

Motivation

This flag is motivated by an ambition to be able to write a multi-threaded c program, that then loads libcob with dlmopen in one namespace per thread, effectively allowing us to call the same libcob / Cobol module from multiple threads in a "thread-safe" way. The image below shows the a very simplified intended call sequence

CanvasAsImage

Because there is a "bug" in dlopen that stops us from calling dlopen("module", RTLD_GLOBAL), this PR introduces an option to be able to change this flag to RTLD_LOCAL at runtime.

As at glibc 2.24, specifying the RTLD_GLOBAL flag when calling dlmopen() generates an error. Furthermore, specifying RTLD_GLOBAL when calling dlopen() results in a program crash (SIGSEGV) if the call is made from any object loaded in a namespace other than the initial namespace.

Docs

# Environment name:  COB_LOAD_GLOBAL
#   Parameter name:  load_global
#          Purpose:  tell the system loader to provide symbols in CALLed
#                    programs globally, allowing symbols to be found for linked
#                    libraries later
#             Type:  boolean
#             Note:  COBOL CALLs will always find symbols in already CALLed or
#                    pre-modules; this setting is mostly an advise to the system,
#                    not all systems are capable of loading libraries global/local
#          Default:  false
#          Example:  load_global true

Implementation

  • Change lt_dlopen from a macro definition to an actual function, which internally calls dlopen with the flags based on the configuration
  • Add cob_load_global / COB_LOAD_GLOBAL as a configuration
  • Add documentation to runtime.cfg about COB_LOAD_GLOBAL

florianschmidt1994 avatar Jan 08 '25 15:01 florianschmidt1994

Apart from the other PR's review to that branch, the change here makes the current scenario even more confusing. I'd like to fix this in this PR (or up-front upstream).

After the changes:

  • instead of calling lt_dlopen we call a new static function cob_dlopen
  • this has the current code of this PR in for the case of dlopen usage
  • for the WIN32 variant it uses what is currently used there
  • for the libtool variant it uses lt_dlopenadvise() with an advise matching the runtime setting

@florianschmidt1994 can you please inspect the libtool variant (to test this you can simply undefine USE_LIBDL in config.h)? I think I'll work on the cob_dlopen stuff late-evening.

GitMensch avatar Jan 08 '25 15:01 GitMensch

Yup, I'll check it out. Basically something like this, right?

static void* cob_dlopen(const char* filename) {
	#ifdef _WIN32
	if (x == NULL) {
		return GetModuleHandle (NULL);
	}
	return LoadLibrary(x);	
	#elif	defined(USE_LIBDL)
	int flags = cobsetptr->cob_load_global 
		? RTLD_LAZY | RTLD_GLOBAL
		: RTLD_LAZY | RTLD_LOCAL;

	return dlopen(filename, flags);
	# else
	// TODO: initialise lt_dladvise
	lt_dlopenadvise(filename, /*advise here*/);
}

florianschmidt1994 avatar Jan 08 '25 16:01 florianschmidt1994

@GitMensch I've tried out your suggestion in the latest commit and temporarily defined lt_dlopen as cob_dlopen to try it out. I've also added some indentation to the macros, does that conform with your style or should I remove them again?

florianschmidt1994 avatar Jan 08 '25 16:01 florianschmidt1994

When developing, I've noticed some test failures, where some of them might be related to our change. On the latest commit of this repositories "main branch" (gcos4gnucobol-3.x), there are some test failures when the build is configured with --without-dl. These failures persist also with our changes applied.

Commit Hash Loader COB_LOAD_GLOBAL Test Failures
731b81a327fe7108ca28d024a5ca13c9457dbcdc --without-dl not applicable 47 49 50 784 failed
731b81a327fe7108ca28d024a5ca13c9457dbcdc not applicable no test failure

In the first version of this PR (bd80e8ce601b1663aa69793f086284c66fd2b463), there is an additional test failure (781) when using COB_LOAD_GLOBAL=false, i.e. the default, and preloading where symbols are overwritten in an unexpected manner.

Commit Hash Loader COB_LOAD_GLOBAL Test Failures
bd80e8ce601b1663aa69793f086284c66fd2b463 false 781 failed
bd80e8ce601b1663aa69793f086284c66fd2b463 true no test failure
bd80e8ce601b1663aa69793f086284c66fd2b463 --without-dl false 47 49 50 781 failed
bd80e8ce601b1663aa69793f086284c66fd2b463 --without-dl true 47 49 50 failed

In history up to the latest commit (62de6254d0418b45e142b14f27245b481c1594b1), there exists a fix (see 1f4d60760e3fb908f9818d68ac0044fae7ee520a), the same that is already applied in WIN32 environments, to fix this.

Commit Hash Loader COB_LOAD_GLOBAL Test Failures
62de6254d0418b45e142b14f27245b481c1594b1 false no test failure
62de6254d0418b45e142b14f27245b481c1594b1 true no test failure
62de6254d0418b45e142b14f27245b481c1594b1 --without-dl false 47 49 50 failed
62de6254d0418b45e142b14f27245b481c1594b1 --without-dl true 47 49 50 failed

florianschmidt1994 avatar Jan 13 '25 09:01 florianschmidt1994

:warning: Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

Attention: Patch coverage is 70.58824% with 5 lines in your changes missing coverage. Please review.

Project coverage is 67.58%. Comparing base (16e84a5) to head (3234374). Report is 44 commits behind head on gcos4gnucobol-3.x.

Files with missing lines Patch % Lines
libcob/call.c 70.58% 2 Missing and 3 partials :warning:

:exclamation: Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@                  Coverage Diff                  @@
##           gcos4gnucobol-3.x     #209      +/-   ##
=====================================================
- Coverage              67.85%   67.58%   -0.28%     
=====================================================
  Files                     33       33              
  Lines                  60458    60827     +369     
  Branches               15821    15896      +75     
=====================================================
+ Hits                   41026    41109      +83     
- Misses                 13567    13847     +280     
- Partials                5865     5871       +6     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codecov-commenter avatar Feb 11 '25 19:02 codecov-commenter

Friendly ping @florianschmidt1994

GitMensch avatar Jun 13 '25 13:06 GitMensch

@florianschmidt1994 thanks for the updates.

Please use the CI generated tarball, to check if your original setup that needed that change (a java starter running the same process more than once) still works. Deep-link to the CI generated source distribution: https://github.com/OCamlPro/gnucobol/actions/runs/16163870452/artifacts/3493326773

Also: can you please create a minimal runner in C that we can add to the testsuite that mimics your java starter? Currently we only verified that the new setting works with both options, but we don't have a test case that only passes with one of those...

GitMensch avatar Jul 09 '25 08:07 GitMensch

The "coverage" titled test fails for C89 compat reasons:


+caller.c:5:22: warning: extra tokens at end of #include directive
+    5 | #include <pthread.h> // For POSIX threads
+      |                      ^
+caller.c:6:22: warning: extra tokens at end of #include directive
+    6 | #include <dlfcn.h>   // For dynamic library loading
+      |                      ^
+caller.c:13:1: error: C++ style comments are not allowed in ISO C90
+   13 | // Function to call COBOL shared library functions
+      | ^
+caller.c:13:1: note: (this will be reported only once per input file)
+caller.c: In function 'callCobol':
+caller.c:34:30: error: ISO C90 forbids mixed declarations and code [-Werror=declaration-after-statement]
+   34 |     RESOLVE_SYMBOL(cob_init, cob_init_func);
+      |                              ^~~~~~~~~~~~~
+caller.c:25:5: note: in definition of macro 'RESOLVE_SYMBOL'
+   25 |     type symbol = (type) dlsym(libCobHandle, #symbol); \
+      |     ^~~~
+caller.c:35:33: error: ISO C90 forbids mixed declarations and code [-Werror=declaration-after-statement]
+   35 |     RESOLVE_SYMBOL(cob_resolve, cob_resolve_func);
+      |                                 ^~~~~~~~~~~~~~~~
+caller.c:25:5: note: in definition of macro 'RESOLVE_SYMBOL'
+   25 |     type symbol = (type) dlsym(libCobHandle, #symbol); \
+      |     ^~~~
+caller.c:36:30: error: ISO C90 forbids mixed declarations and code [-Werror=declaration-after-statement]
+   36 |     RESOLVE_SYMBOL(cob_tidy, cob_tidy_func);
+      |                              ^~~~~~~~~~~~~
+caller.c:25:5: note: in definition of macro 'RESOLVE_SYMBOL'
+   25 |     type symbol = (type) dlsym(libCobHandle, #symbol); \
+      |     ^~~~
+caller.c:38:5: error: ISO C90 forbids mixed declarations and code [-Werror=declaration-after-statement]
+   38 |     hello_world_func hello_world = cob_resolve("prog");
+      |     ^~~~~~~~~~~~~~~~
+caller.c: In function 'main':
+caller.c:60:5: error: 'for' loop initial declarations are only allowed in C99 or C11 mode
+   60 |     for (long i = 0; i < numThreads; i++) {
+      |     ^~~
+caller.c:60:5: note: use option '-std=c99', '-std=gnu99', '-std=c11' or '-std=gnu11' to compile your code
+caller.c:67:14: error: conflicting types for 'i'; have 'int'
+   67 |     for (int i = 0; i < numThreads; i++) {
+      |              ^
+caller.c:60:15: note: previous definition of 'i' with type 'long int'
+   60 |     for (long i = 0; i < numThreads; i++) {
+      |               ^
+caller.c:67:5: error: 'for' loop initial declarations are only allowed in C99 or C11 mode
+   67 |     for (int i = 0; i < numThreads; i++) {
+      |     ^~~
+cc1: some warnings being treated as errors

To reproduce that locally compile it with gcc -std=c89 -Werror=declaration-after-statement - in general you want variables to start at the begin of a block and use */ block comments */ .

GitMensch avatar Aug 14 '25 13:08 GitMensch

and the MSYS1 entries error because of missing pthread support:

+caller.c:5:10: fatal error: pthread.h: No such file or directory
+    5 | #include <pthread.h> // For POSIX threads
+      |          ^~~~~~~~~~~

It is fine to skip this test on MSYS1. To do so add

# skip test for MSYS1 - we don't have pthread there
# FIXME: make that test more portable
AT_SKIP_IF([test "$OSTYPE" = "msys"])

at the start of the test.

GitMensch avatar Aug 14 '25 13:08 GitMensch

Thanks for the hints @GitMensch. I've also disabled the tests on macOS, as there is no dlmopen there.

Test runs on Windows (e.g. https://github.com/OCamlPro/gnucobol/actions/runs/17068765301/job/48392123249?pr=209) say caller.c not found when trying to execute.

AT_CHECK([$COB_CC $COB_CFLAGS caller.c -ldl $COB_LIBS -o caller], [0], [], [])

Do you know if we might need to call the compiler arguments differently to make it more portable across platforms?

Also there's an issue with help2man in the Windows MSYS2 prepare workflow (https://github.com/OCamlPro/gnucobol/actions/runs/17069516851/job/48394605655?pr=209) that I don't quite understand, yet. Any ideas?

florianschmidt1994 avatar Aug 19 '25 12:08 florianschmidt1994

Thanks for the hints @GitMensch. I've also disabled the tests on macOS, as there is no dlmopen there.

Test runs on Windows (e.g. https://github.com/OCamlPro/gnucobol/actions/runs/17068765301/job/48392123249?pr=209) say caller.c not found when trying to execute.

AT_CHECK([$COB_CC $COB_CFLAGS caller.c -ldl $COB_LIBS -o caller], [0], [], [])

Use AT_CHECK([$COBC -x -ldl -lpthread -caller.c], [0], [], []) there (drop pthread if you think that's really not necessary). The Windows tests likely haven't setup those vars correctly for some MSYS2 translation reason.

GitMensch avatar Aug 19 '25 15:08 GitMensch