`byte_conversion`, `bytscl` and `formats` tests fail on Apple Silicon CI
@gnudatalanguage/gdldevs, anyone having access to an Apple Silicon machine, help welcome! Thanks
Example output (https://github.com/gnudatalanguage/gdl/actions/runs/10431443505/job/28891119074)
66/214 Test #66: test_byte_conversion.pro ...........***Failed 0.65 sec
...
% TEST_BYTE_BASIC_32768: Error on operation : input as Long for : -32765
% TEST_BYTE_BASIC_32768: Error on operation : input as Long for : -32766
% TEST_BYTE_BASIC_32768: Error on operation : input as Long for : -32767
% TEST_BYTE_BASIC_32768: Error on operation : input as Long for : -32769
% TEST_BYTE_BASIC_32768: Error on operation : input as Long for : -32770
...
67/214 Test #67: test_bytscl.pro ....................***Failed 0.69 sec
...
% TEST_BYTSCL_TOP: Error on operation : Pb with top= -128, min/max : 0, 0
...
102/214 Test #102: test_formats.pro ...................***Failed 1.13 sec
...
% When using the RAN1 mode, be sure to keep the RAN1 and dSFMT seed arrays in separate variables.
multiple reference file <<formats.GDL>> found ! First used !!
/Users/runner/work/gdl/build/testsuite/formats.GDL
/Users/runner/work/gdl/gdl/testsuite/formats.GDL
Files to be compared : formats.IDL, formats.GDL
...
% TEST_FORMATS: = 1595 errors encountered during TEST_FORMATS tests =
...
The following tests FAILED:
66 - test_byte_conversion.pro (Failed)
67 - test_bytscl.pro (Failed)
102 - test_formats.pro (Failed)
OK, but too much troubles on OSX now, I need some quiet time to investigate :( (short answer : when compilating without the script, only test_formats is broken --but I experienced glitches with the tests, I don't know why-- when compilating with the script, maybe due to wxwidget, everything is broken)
see #1881
as of now, we have the following failures on CI:
- macos-14 (i.e., Apple Silicon): test_hdf5 & test_formats
- macos-12 (i.e., Intel): test_hdf5
The test_hdf failure is:
112/214 Test #112: test_hdf5.pro ......................***Failed 0.47 sec
% Compiled module: TEST_HDF5.
% Compiled module: FILE_SEARCH_FOR_TESTSUITE.
% Compiled module: BANNER_FOR_TESTSUITE.
% Compiled module: GDL_IDL_FL.
% TEST_HDF5_BYTE_ARRAY:
NO errors encountered during TEST_HDF5_BYTE_ARRAY tests
% TEST_HDF5_STRING: NO errors encountered during TEST_HDF5_STRING tests
h5dump error: unable to open attribute "attr-02"
% TEST_HDF5_ATTR: 3 errors encountered during TEST_HDF5_ATTR tests
% TEST_HDF5_DATA: NO errors encountered during TEST_HDF5_DATA tests
% H5G_CREATE: Bad value
% TEST_HDF5_OBJ_INFO: NO errors encountered during TEST_HDF5_OBJ_INFO tests
% TEST_HDF5_COMP: NO errors encountered during TEST_HDF5_COMP tests
% H5G_CREATE: Bad value
% TEST_HDF5: =======================================================
% TEST_HDF5: = =
% TEST_HDF5: = 3.00000 errors encountered during TEST_HDF5 tests =
% TEST_HDF5: = =
% TEST_HDF5: =======================================================
I also noticed this we need to fix it and release v1.1.1 asap
Duplicate of #1883 and could be related with: #1899?
@slayoo #1904 shows NO ERRORS on macos-12 (when did you see an error ?)
So on osx 14 = ARM64 = Apple Silicon = M1/M2/M3 the culprit could be that the HDF5 library has the same problem as GDL, i.e., #1881
I have no currently a M1 machine at hand to handle #1881 but it would serve nothing if the HDF5 library is rotten there...
@slayoo https://github.com/gnudatalanguage/gdl/pull/1904 shows NO ERRORS on macos-12 (when did you see an error ?)
Yesterday's macos-12 build for the master branch has test_hdf5 failed:
https://github.com/gnudatalanguage/gdl/actions/runs/11681438801/job/32526479047
% TEST_HDF5_BYTE_ARRAY:
NO errors encountered during TEST_HDF5_BYTE_ARRAY tests
% TEST_HDF5_STRING: NO errors encountered during TEST_HDF5_STRING tests
h5dump error: unable to open attribute "attr-02"
% TEST_HDF5_ATTR: 3 errors encountered during TEST_HDF5_ATTR tests
% TEST_HDF5_DATA: NO errors encountered during TEST_HDF5_DATA tests
% H5G_CREATE: Bad value
I get the test_hdf5.pro error , too on a local build in Arch. It turns out to be an error in wait() within spawn. I made a small change in basic_pro.cpp:
diff --git a/src/basic_pro.cpp b/src/basic_pro.cpp
index 87c0e005..33a5c28f 100644
--- a/src/basic_pro.cpp
+++ b/src/basic_pro.cpp
@@ -2142,7 +2142,9 @@ static DWORD launch_cmd(BOOL hide, BOOL nowait,
// wait until child terminates
int status;
- pid_t wpid = wait(&status);
+ if (wait(&status) == -1) {
+ Warning(DString("SPAWN: Error waiting for child process: ") + strerror(errno));
+ }
if (exit_statusKeyword)
e->SetKW(exit_statusIx, new DLongGDL(status >> 8));
to track down whats happening. At least in my local build, wait() returns -1 and sets errno to ECHILD, this is what this small change reveals.
Since test_hdf5.pro uses spawn, and status does not get set to the return value of the spawned command due to the error in wait(), the test fails on my maschine.
However, I could not yet figure why I get ECHILD in the first place.
Maybe someone else has a clue, I'll try to dig into this a bit more, too.
Best Jan
BTW: In https://github.com/gnudatalanguage/gdl/pull/1913 build on MacOS 14 fails with this error, too: https://github.com/gnudatalanguage/gdl/actions/runs/11942416687/job/33289421103?pr=1913
As I mentioned in #1910 we do have a trouble in SPAWN for wait()
I am not sure it is the best way to just make a small patch without trying to do a good management of wait() and SPAWN ... But I am too bad in C to understand https://man7.org/linux/man-pages/man2/waitpid.2.html
I also check that this is the only occurrence of wait()
I tried waitpid(pid, &status, 0) while debugging, this doesn't change anything... There must be something else broken. :sob:
@alaingdl Got the reason: The culprit is setting SIGCHLD to SIG_IGN (gdl.cpp, line 519). This was introduced in #1870 (at least according to the commit message). However, in doing so, a wait() call will result in waiting for all children to finish and then return -1 (as documented in https://man7.org/linux/man-pages/man2/wait.2.html). Commenting this line restores correct behaviour for wait() at least on my local maschine.
We might have another problem for the other failing tests on Apple, though.
Best, Jan
@jkohnert bravo for the find! When I made #1870 I was not aware SIG_IGN would influence SPAWN (silly, in retrospect, IDL_IDLBridge being a more refined method of gdl respawning itself). Not clear if it will be safe to locally change SIGCHLD handling during the SPAWN command, though. TBC.
closing as we have 2 different issues there. One solved (spwan). The other not easily solvable (arm64 different math). Opening another issue.