gnucobol icon indicating copy to clipboard operation
gnucobol copied to clipboard

Test 772 fails on Windows

Open lefessan opened this issue 2 years ago • 5 comments

From https://github.com/OCamlPro/gnucobol/actions/runs/6675975179?pr=109

772. run_misc.at:4507: testing direct CALL in from C w/wo error; no exit ...
../../tests/run_misc.at:4618: $COMPILE caller.c
../../tests/run_misc.at:4619: $COMPILE_MODULE callee.cob callee2.cob buggy.cob
../../tests/run_misc.at:4620: $COBCRUN_DIRECT ./caller callee 00
../../tests/run_misc.at:4623: $COBCRUN_DIRECT ./caller callee 42
../../tests/run_misc.at:4626: $COBCRUN_DIRECT ./caller callee2
--- -   2023-10-28 09:13:49.613224400 +0000
+++ /d/a/gnucobol/gnucobol/_build/tests/testsuite.dir/at-groups/772/stderr      2023-10-28 09:13:49.548724300 +0000
@@ -1,2 +1 @@
-note: STOP RUN with return code 2
 
--- -   2023-10-28 09:13:49.690763800 +0000
+++ /d/a/gnucobol/gnucobol/_build/tests/testsuite.dir/at-groups/772/stdout      2023-10-28 09:13:49.642475500 +0000
@@ -1 +1 @@
-STOP WITH 2
+
../../tests/run_misc.at:4626: exit code was 127, expected 0
772. run_misc.at:4507: 772. direct CALL in from C w/wo error; no exit (run_misc.at:4507): FAILED (run_misc.at:4626)

lefessan avatar Oct 28 '23 21:10 lefessan

I stumbled on that one in #116 . Copy-pasting my analysis.

The failure is random, sometimes the test is OK, sometimes it fails. The problem occurs specifically with STOP RUN (replacing by an EXIT program gives no error). Also, it does NOT occur if we define COB_WITHOUT_JMP. Investigating a bit, it seems this is because the COBOL module executing the STOP RUN statement is unloaded (with lt_dlclose) before cob_stop_run has finished its execution. This might be okay when calling exit, but when using longjmp, this probably messes up the stack frame.

ddeclerck avatar Nov 02 '23 11:11 ddeclerck

Instead of compiling with COB_WITHOUT_JMP it likely would also work to just disable the dlclose() by using COB_PHYSICAL_CANCEL=never $COBCRUN_DIRECT ./caller callee2 (inline, only for that run), right? What may works is COB_PRE_LOAD=callee2 $COBCRUN_DIRECT ./caller callee2

If this works we could do that in the testsuite and document that this may be necessary on some environments (known: Windows) or change the function cob_call_with_exception_check() (but then likely cob_call(), too) to set a flag "called by API" and always skip the complete module unloading part.

@ddeclerck Could you have a look at this, please?

GitMensch avatar Nov 02 '23 20:11 GitMensch

@ddeclerck Could you have a look at this, please?

Sure (as soon as I find a moment).

ddeclerck avatar Nov 03 '23 15:11 ddeclerck

Finally had time to have a look at this one. So, using COB_PHYSICAL_CANCEL=never does indeed prevent the bug from occurring. However COB_PRE_LOAD=callee2 does not help.

Now, what should we do ? Just add the workaround in the testsuite and document cob_call_with_exception_check as unsuitable for Windows ? Or as you suggest implement a flag to prevent unloading when calling from cob_call_with_exception_check ?

ddeclerck avatar Dec 12 '23 14:12 ddeclerck

I'd prefer the second - and document that modules will only be unloaded with this function if after the call a manual call to cob_tidy() is done.

GitMensch avatar Dec 12 '23 23:12 GitMensch

@ddeclerck Do we still work around this issue or fixed it by upstreaming the related PR?

GitMensch avatar Jul 29 '25 11:07 GitMensch

That was fixed a while ago with PR #129 mentionned above.

ddeclerck avatar Jul 29 '25 12:07 ddeclerck