arpack-ng icon indicating copy to clipboard operation
arpack-ng copied to clipboard

icb_parpack_cpp test is failing on ppc64el and others archs

Open sylvestre opened this issue 7 years ago • 15 comments

https://buildd.debian.org/status/fetch.php?pkg=arpack&arch=ppc64el&ver=3.6.1-1&stamp=1529756139&raw=0

rank 0 - 1000.15 1000.15
Correct eigenvalues not computed
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
with errorcode 1.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
FAIL icb_parpack_cpp (exit status: 1)

sylvestre avatar Jun 23 '18 12:06 sylvestre

@fghoussen Does it ring a bell? we could just disable mpi on this arch...

sylvestre avatar Jun 23 '18 12:06 sylvestre

Not really: soundless bell ! In build dependencies, I see libopenmpi3: could you try with openmpi2 ?

fghoussen avatar Jun 23 '18 15:06 fghoussen

If switching from openmpi2 to openmpi3 is OK on ppc64el (?), try it also on armel

fghoussen avatar Jun 23 '18 15:06 fghoussen

other archs are indeed failing: https://buildd.debian.org/status/package.php?p=arpack

sylvestre avatar Jun 24 '18 07:06 sylvestre

If libopenmpi3 is not to blame (?), then C++11 could be to blame....

I've just seen in the (armel) log (at the end) :

libtool: link: mpic++ -std=gnu++11 ...
E: Build killed with signal TERM after 150 minutes of inactivity

The build seems to hang. Looks like the compiler goes crazy because he doesn't know what to do with the (hard-coded) gnu++11 ?!... Really not sure (what is the compiler behind ?). I have no good answer here. I PR a few lines: kill them if they don't go the way you like

fghoussen avatar Jun 25 '18 13:06 fghoussen

Looks more like a test hanging than the compiler disliking gnu++11. The compiler seems to have finished by the time the timeout message is printed.

10110111 avatar Jun 25 '18 13:06 10110111

Yeah maybe. Can you have more verbose logs ? Or compile with make VERBOSE=1 ?

fghoussen avatar Jun 25 '18 13:06 fghoussen

I guess not. But from the log it seems that the first test in the MPI directory hangs. I suppose only adding some prints to the test would help debugging.

10110111 avatar Jun 25 '18 13:06 10110111

I'm not entirely sure about this, but in the CMake it links against the CXX and the C libraries for MPI. Is this likely a problem?

Other possibility is a dead lock :/

dbeurle avatar Jun 25 '18 13:06 dbeurle

Just realized hang is after issue46 test... Which is not an icb one ?! No idea what's going on

fghoussen avatar Jun 25 '18 14:06 fghoussen

@sylvestre: I guess the problem on armel/ppc64 may be fixed by 9742da75dc2f7 !....

fghoussen avatar Jun 25 '18 14:06 fghoussen

@sylvestre : if you're about to publish a new release, ckeckout if this issue is (should be !) fixed

fghoussen avatar Dec 30 '18 14:12 fghoussen

May be fixed by f36eb6c75dc686c8cebdaca0106422d77b04152d

fghoussen avatar Aug 15 '22 10:08 fghoussen

May be fixed by f36eb6c

@sylvestre: if #397 is released would be good to check if this problem is solved

fghoussen avatar Feb 12 '23 11:02 fghoussen

yeah, i will upload it once 3.9.1 is tagged :)

sylvestre avatar Feb 12 '23 17:02 sylvestre