arpack-ng
arpack-ng copied to clipboard
COMMON block variables are not initialized in pArpack
I have some errors on HPC cluster (don't see anything like that on macOS or Ubuntu 16 or 17):
At line 68 of file <blabla>/arpack-ng-3.4.0/PARPACK/UTIL/MPI/pdvout.f
Fortran runtime error: Bad unit number in statement
The same issue with 3.5.0.
From the description, it seems that the routine is for printing https://github.com/opencollab/arpack-ng/blob/master/PARPACK/UTIL/MPI/pdvout.f#L68
but I don't see where/why Arpack would print anything while solving symmetric hermitian generalized eigenproblem with shift-and-invert spectral transformation.
EDIT: I googled and it looks like COMMON variables do not have default initialization. Since I call arpack from C++ and link against both Arpack and pArpack, for now my workaround of this problem is
sed -i -e 's/\(\s*msglvl = \)\(.*\)/\10/g' *.f
You have msglvl .gt. 0 somewhere.
what is msglvl ?
In debug.h there are several integers which can be used to get information from the subroutines. For instance, if msaupd is 1 in your main program, then pdsaupd.f will set msglvl to msaupd and print some information. The output is redirected to unit logfil, which should be set to 6 (stdout).
@caliarim sorry, I am lost here as I don't really work with Fortran much. I am using C-wrappers to pArpack functions from a C++ application. There is indeed debug.h in the SRC file, but my pArpack installation has no headers. It's probably missing in cmake's scripts.
Also I do not set msaupd to anything as I was simply not aware of it. I would expect that the default is no debug output? You probably refer to these http://www.caam.rice.edu/software/ARPACK/UG/node133.html and https://github.com/opencollab/arpack-ng/blob/master/DOCUMENTS/debug.doc ?
p.s. It would be good if one could disable all output at configure time with cmake.
p.p.s. the msglvl in those pArpack routines is indeed set in pdsaupd to be msaupd and pdseupd to mseupd.
@caliarim
if msaupd is 1 in your main program, then pdsaupd.f will set msglvl to msaupd and print some information
could you please help clarify this given my previous reply? Especially how is it possible that this output happens if I call parpack wrappers from C++ and I certainly don't set msaupd?
I tried to check default values of those variables within (p)Arpack sources, but did not find anything but a few examples/tests where msaupd is set. I am not a Fortran user, but I guess the default is zero, so technically I should not see any output.
I never used pArpack. But I would try the following: for each variable in debug.h, starting from mgetv0, I would set it explicitely to 0 in your main file.
@fghoussen since you added C/C++ bindings, maybe you know how to set those common block variables of (p)Arpack from C/C++? I tried
extern "C" {
extern struct
{
int logfil;
int ndigit;
int mgetv0;
int msaupd;
int msaup2;
int msaitr;
int mseigt;
int msapps;
int msgets;
int mseupd;
int mnaupd;
int mnaup2;
int mnaitr;
int mneigh;
int mnapps;
int mngets;
int mneupd;
int mcaupd;
int mcaup2;
int mcaitr;
int mceigh;
int mcapps;
int mcgets;
int mceupd;
} debug_;
}
int main (int argc, char *argv[])
{
debug_.logfil = 6;
debug_.msaupd = 1;
debug_.mseupd = 0;
debug_.ndigit = -3;
<blabla>
}
but I get EXC_BAD_ACCESS on setting those. What's even more frightening is
(lldb) p &debug_.logfil
error: Multiple external symbols found for 'debug_'
id = {0x0000038a}, range = [0x000000011a72fc00-0x000000011a72fc60), name="debug_"id = {0x00000449}, range = [0x000000011a78aba0-0x000000011a78ac00), name="debug_"
Since I link against both Arpack and pArpack, it could be that the two are confused. Indeed
$ nm -g <path-to>/arpack-ng-3.5.0-7b5boj3c7gsexjreipjcapj7525qgzfa/lib/libarpack.2.dylib | grep debug_
000000000004cba0 S _debug_
and
$ nm -g <path-to>/arpack-ng-3.5.0-7b5boj3c7gsexjreipjcapj7525qgzfa/lib/libparpack.2.dylib | grep debug_
000000000004bc00 S _debug_
So it looks like if the user library/executable links against both, there is no way to do anything about those COMMON variables.
I close as I worked around the problem by patching.
There is indeed debug.h in the SRC file, but my pArpack installation has no headers. It's probably missing in cmake's scripts.
Yes, I would say debug.h is missing, and, that it should be added by make install (both with cmake and autotools). @sylvestre what do you think ?
@fghoussen since you added C/C++ bindings, maybe you know how to set those common block variables of (p)Arpack from C/C++?
iso_c_binding is a gateway from Fortran to C only (not C++). debug.h is already C: binding is not a problem here. I would say:
debug.his missing in the install directory- your
EXC_BAD_ACCESSseems "logical" to me !... Indeed, as you need to use what's indebug.h, at C side, I guess you need to specifystatickeyword and initialize all static variables (debug_) out of the scope of the main: seems it is not what you do in your code snippet. I would say the errorMultiple external symbolscomes from the fact that the compiler/linker is mislead and believesdebug_is a symbol defined both in and out of the main.
I would say that if you succeed to get 1) and 2) your patch/trick (with sed) will no more be necessary.
Thanks @fghoussen for the detailed reply. Since I am under pressure to get some results, for now I will stick with patching and maybe get back to this issue later.
Sure.
I didn't tested that (you'll probably need to "play" with extern and static anyway...) : my first try would be to add " static debug_ dbg_ + initialize dbg_ " after the extern bloc but before main, and, use dbg_ in main... Good luck ! :D
Oh men !... Just realized debug.h is fortran, not C (I was mislead by the file name). My suggestion will probably not work (at least not so "easily"...). I have on-going stuffs to run here in priority. I'll try to push a PR about that in coming days (if I can get it to work... which is not sure at all - icb is needed as debug.h is frotran)
I had a closer look at that. Seems not trivial at all !... For now, I have no solution. I'll PR one if I get something to work.
For now, the best I have is "nothing" (if passing common with icb), or, "segmentation fault" (if not passing common with icb) !... :D @davydden : not sure to succeed to finish this (icb is already cumbersome, the /debug/ common block makes it even more terrible !)... But I can push that (?) if you want to restart from here. Let me know :D
@fghoussen I think you invested already enough time into trying to solve this issue, thank you π
But I can push that (?) if you want to restart from here. Let me know :D
I have close to no experience with Fortran, so let's just leave it as-is for now.
OK, I'll let that unfinished (took me months to get parpack to work - the key was to pass MPI_Comm)... Who knows, I'll maybe have the one magic idea that makes everything work tomorrow... Or not ! :D
@davydden I guess I got it !... You may want to test it :D @sylvestre I'am about to PR a fix for this issue
as of d7deefda828329037163720dc976f2527672564c the issue is still there.
based on https://www.obliquity.com/computer/fortran/common.html
the name of the BLOCK DATA subprogram unit should be included in an EXTERNAL statement in some or all of the procedures which contain the named COMMON blocks initialised by the BLOCK DATA subprogram.
will try adding EXTERNAL debug_init ...
@fghoussen @sylvestre @caliarim here is an interesting thing I found while debugging this issue. Inside PARPACK/SRC/MPI/pdgetv0.f I see
(gdb) f 1
#1 0x00007fffd7af4289 in pdgetv0 (comm=0, ido=2, bmat='G', itry=1, initv=.TRUE., n=1089, j=1, v=..., ldv=1089, resid=..., rnorm=0.2379588772948126, ipntr=..., workd=..., workl=..., ierr=0, _bmat=1) at <arpack>/PARPACK/SRC/MPI/pdgetv0.f:428
428 & '_getv0: B-norm of initial / restarted starting vector')
(gdb) p mgetv0
$1 = 0
(gdb) p msglvl
$2 = 251672707
So msglvl is rubbish, at least that's what gdb tells me. And then it tries to print at PARPACK/SRC/MPI/pdgetv0.f:428.
There is assignment
msglvl = mgetv0
but this is inside if (ido .eq. 0).
p.s. As I mentioned before, changing msglvl = 0 solves the issue, but since I don't have much experience Fortran I really don't know what's going on and how two integer variables may have different values after assignment.
Note that it's not the if which is at fault, right after msglvl = mgetv0 is executed I have
(gdb) p msglvl
$4 = 251672707
this happens with gfortran 4.8.5.
@davydden: did you try adding a dummy program ? Seems external in program forces to load and initialize the common blocks (via block data). Arpack is a library (probably no program, just subroutines) : so either you add external in ALL subroutines (no way !), or, you add a dummy program (that nobody will never use) and hope this will be enough to initialize the debug commonblock: not sure if this could work...
>> git diff
diff --git a/debug_init.f90 b/debug_init.f90
index f4c4f8e..74ae137 100644
--- a/debug_init.f90
+++ b/debug_init.f90
@@ -10,3 +10,7 @@ block data debug_init
mcaupd, mcaup2, mcaitr, mceigh, mcapps, mcgets, mceupd &
/ 6, -3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 /
end block data debug_init
+
+! This dummy program is not meant to be used: it's meant to initialize the debug common block.
+program dummy
+ include 'debug.h'
+ external debug_init ! Trigger initialization (block data).
+end program dummy
If this does not work, or, if you believe this "trick" is "bad" (I could understand that ! :D) I guess you need to add external mgetv0 after ALL declarations of integer mgetv0 in ALL subroutines AND you must do that for ALL variables in the debug common block !...
@fghoussen you probably mis-read my post. Itβs not the common block variables that have issues. Have a look at the printing from debugger.
mgetv0 belongs to common /debug/and is supposed to initialize msglvl(as far as I understand). That is why I guess initializing the common block could fix the issue (not sure).
But you missed the bizarre
(gdb) p mgetv0
$1 = 0
Yes and no ! :D When debugging fortran, you can not always rely on what gdb says !...
My understanding is that, since fortran 2003, DATA is meant to replace SAVE (DATA = SAVE + initialization). Not 100% sure but seems likely. The only materials I found on the web is in french: go here http://calcul.math.cnrs.fr/Documents/Ecoles/LEM2I/Mod3/fortran.pdf (slide 19) and / or here http://www.idris.fr/media/formations/fortran/idris_fortran_avance_cours.pdf (search for data and save). Also here you find little clue : http://www.fortran.com/fortran/books/t90_82.html (check out 5th things to know).
I guess that ALL variables involved in a save statement should be replaced by a data statement : this is a tedious work to go (git grep "save " tells that there is a lot of save all over the place)...
Example for one save in one file:
~> vi PARPACK/SRC/MPI/pdnaitr.f
! save first, iseed, inits, iter, msglvl, orth, rnorm0 ! before fortran 2003
data first, iseed, inits, iter, msglvl, orth, rnorm0 / ...., 0, ..... / ! after fortran 2003
Sometimes, I'am not even sure to really know how variables should be initialized (iseed for instance?) !?...
@davydden : I guess you need to agree with @sylvestre on what is needed / critical to do ! Seems ideally, all save should be replaced with data : as this is a long and risky/tricky job, you may agree to do that only for msglvl and put that in a "one commit operation" to keep history clean and allow to step back if bug show up... If this turns out to be OK, you may want to do the same for all other variables step by step... Or not ! You decide guys. :D
@fghoussen wow, thanks for reply. The more I know about Fortran, the less I want to ever touch it ;-) I will try a few things tomorrow and report back
@fghoussen here's the update
- having
external debug_initinpdgetv0.f
include 'debug.h'
include 'stat.h'
external debug_init
does not help, i still get to the printing
- while adding a dummy program in
debug_initI noticed that nothing changes becausedebug_init.f90is only compiled when ICB is enabled. I enabled it (fixed one bug along the way) but no luck here either. Even with a dummy program I still get into printing. I also re-tried the option (1) after enabling ICB.
Bottom line: I think I tried everything suggested but it's still broken.
Your last chance could be :
>> git diff
diff --git a/PARPACK/SRC/BLACS/pcgetv0.f b/PARPACK/SRC/BLACS/pcgetv0.f
index 4f3f133..4b80ed9 100644
--- a/PARPACK/SRC/BLACS/pcgetv0.f
+++ b/PARPACK/SRC/BLACS/pcgetv0.f
@@ -182,7 +182,8 @@ c
& rnorm0
Complex
& cnorm, cnorm2
- save first, iseed, inits, iter, msglvl, orth, rnorm0
+ save first, iseed, inits, iter, orth, rnorm0
+ data msglvl / 0 /
c
c %----------------------%
c | External Subroutines |
If OK, you should do that in all subroutines which declare and use msglvl.
If KO, I have no solution !...
Your last chance could be
Unfortunately that does not change the behaviour π . But thanks for the suggestion, @fghoussen
What is the compiler you use ? Is it intel (sometimes bugged) ? Did you try with gnu ? May be a compiler bug... Already seen that: if you compile with intel and -02 or -O3, try with no optim -O0 : is the problem still here ?
write out msglvl and mgetv0 after line 241:
msglvl = mgetv0
Did you try:
- to compile with
-std=2003 - try with both
saveanddata
What is the compiler you use ? Is it intel (sometimes bugged) ? Did you try with gnu ?
gnu 4.8.5.
Did you try:
will do on Monday as I am AFK today.