ForTrilinos
ForTrilinos copied to clipboard
Implicit linear solver fails with Intel compiler
@sethrj @tjfulle
Nate from LANL discovered that. I can reproduce on condo with Intel 18. GCC is fine.
Backtrace:
(gdb) bt
Program received signal SIGSEGV, Segmentation fault.
fortpetra::c_f_pointer_fortpetraoperator (clswrap=..., fptr=0x2ae360058b4810) at /home/xap/code/trilinos-fortrilinos/packages/ForTrilinos/src/tpetra/src/fortpetra.f90:6443
6443 fptr => handle%data
#0 fortpetra::c_f_pointer_fortpetraoperator (clswrap=..., fptr=0x2ae360058b4810) at /home/xap/code/trilinos-fortrilinos/packages/ForTrilinos/src/tpetra/src/fortpetra.f90:6443
#1 0x00002aaaaef4ccfc in fortpetra::swigd_fortpetraoperator_getdomainmap (fresult=..., fself=...) at /home/xap/code/trilinos-fortrilinos/packages/ForTrilinos/src/tpetra/src/fortpetra.f90:6497
#2 0x00002aaaaf031714 in ForTpetraOperator::getDomainMap (this=0x2aaaaf2ea8a8 <fortpetra_mp_c_f_pointer_fortpetraoperator_$HANDLE.0.137>) at /home/xap/code/trilinos-fortrilinos/packages/ForTrilinos/src/tpetra/src/fortpetraFORTRAN_wrap.cxx:746
#3 0x00002aaaaaed2a52 in ForTrilinos::TrilinosSolver::setup_solver (this=0x2aaaaf2ea8a0 <fortpetra_mp_c_f_pointer_fortpetraoperator_$FSELF_PTR.0.137>, paramList=...) at /home/xap/code/trilinos-fortrilinos/packages/ForTrilinos/src/simple/src/solver_handle.cpp:62
#4 0x00002aaaaaec697b in _wrap_TrilinosSolver_setup_solver (farg1=0x2aaaaf2ea8a0 <fortpetra_mp_c_f_pointer_fortpetraoperator_$FSELF_PTR.0.137>, farg2=0x2aaaaf2ea8a8 <fortpetra_mp_c_f_pointer_fortpetraoperator_$HANDLE.0.137>) at /home/xap/code/trilinos-fortrilinos/packages/ForTrilinos/src/simple/src/fortrilinosFORTRAN_wrap.cxx:737
#5 0x00002aaaaaec5dc2 in fortrilinos::swigf_trilinossolver_setup_solver (self=0x2ae360058b4810, paramlist=0x2ae360058b4810) at /home/xap/code/trilinos-fortrilinos/packages/ForTrilinos/src/simple/src/fortrilinos.f90:331
#6 0x0000000000412a24 in main () at /home/xap/code/trilinos-fortrilinos/packages/ForTrilinos/src/simple/test/test_simple_solver_handle.f90:317
#7 0x00000000004108de in main ()
@sethrj Do you have a simple ioc example to test with Intel (outside of ForTrilinos)?
Yes, if you look at Examples/fortran/director
inside the "callback" branch, that should be what you need.
Ok...in the director example, I try the following with the gcc/6.4.0
swig -fortran -c++ director.i
g++ -c director.cxx director_wrap.cxx
ar rvs director.a director.o director_wrap.o
gfortran -c director.f90
gfortran runme.f90 director.o director.a -lstdc++
This compiles fine, and when run produces the following output:
[sn-fey2] director - ./a.out
test_subclass
Transformed: 'whee'
Transformed: [whee]
test_transform
Transformed: "whiskey", and "tango", and "foxtrot", and "sierra", and "juliet"
Joined with commas: "whiskey", "tango", "foxtrot", "sierra", "juliet"
test_actual
Transformed: 'whiskey', and 'tango', and 'foxtrot', and 'sierra', and 'juliet'
Joined with commas: 'whiskey', 'tango', 'foxtrot', 'sierra', 'juliet'
Joined with default: 'whiskey', 'tango', 'foxtrot', 'sierra', 'juliet'
Joined with commas: [whiskey], [tango], [foxtrot], [sierra], [juliet]
Joined with default: [whiskey]><[tango]><[foxtrot]><[sierra]><[juliet]
Transformed: "whiskey", and "tango", and "foxtrot", and "sierra", and "juliet"
Transformed: !whiskey!, and !tango!, and !foxtrot!, and !sierra!, and !juliet!
Joined with commas: !whiskey!, !tango!, !foxtrot!, !sierra!, !juliet!
I then blow away the .o, .mod, and .a files and try the following with intel/18.0.2
icpc -c director.cxx director_wrap.cxx
ar rvs director.a director.o director_wrap.o
ifort -c director.f90
ifort runme.f90 director.o director.a -lstdc++
I get the following error:
runme.f90(75): error #8212: Omitted field is not initialized. Field initialization missing: [SWIGDATA]
allocate(join, source=SingleJoiner())
^
compilation aborted for runme.f90 (code 1)
So...putting in stuff like the following let's me get past the compile errors.
type(SingleJoiner) :: sj
type(BracketJoiner) :: bj
! NOTE: because we're not calling any C functions here, we don't actually
! have to call init_FortranJoiner
write(*,*) "test_subclass"
allocate(join, source=sj)
However, when I run the resulting executable, I get a segfault:
[sn-fey2] director - ./a.out
test_subclass
Transformed: 'whee'
Transformed: [whee]
test_transform
Transformed: "whiskey", and "tango", and "foxtrot", and "sierra", and "juliet"
Joined with commas: "whiskey", "tango", "foxtrot", "sierra", "juliet"
test_actual
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
a.out 000000000041CF4D Unknown Unknown Unknown
libpthread-2.17.s 00002AB3F1D645E0 Unknown Unknown Unknown
a.out 0000000000409A46 Unknown Unknown Unknown
a.out 00000000004100E8 Unknown Unknown Unknown
a.out 00000000004104BC Unknown Unknown Unknown
a.out 000000000040B7AB Unknown Unknown Unknown
a.out 0000000000407F7A Unknown Unknown Unknown
a.out 0000000000404F25 Unknown Unknown Unknown
a.out 00000000004046B2 Unknown Unknown Unknown
a.out 0000000000403AEE Unknown Unknown Unknown
libc-2.17.so 00002AB3F1F92C05 __libc_start_main Unknown Unknown
a.out 00000000004039E9 Unknown Unknown Unknown
Then, for completeness, I go back and build it all again with GCC to make sure I didn't biff something as I was editing runme.f90, and it runs just fine.
Sorry in advance for a long post. The segfault is happening in c_f_pointer_Joiner. I can run both the gcc and intel versions in totalview to see whats going on. Here's a gcc screenshot:
and here's the intel screenshot:
Note the difference in the representation of clswrap. The intel version seems to be creating a stuct out of ptr, where the gcc version doesn't. I think a result of this is that fself_ptr is nonsense in the intel version, which then causes a segfault a line 695. I could use some help interpreting the significance of this, maybe from @sethrj?
Minimized the previous comment, as I think it's been overtaken by newer information. In a nutshell, I think there's an intel compiler bug, though I could benefit from another pair of eyes to confirm. If you look at what goes into the swigd_Joiner_transform call in FortranJoiner::transform (in director_wrap.cxx, see below). The arguments are (&self,&arg1)
and compare it to what actually arrives in swigd_Joiner_transform (in director.f90, arguments are farg1 and farg2), you see the following.
Note that the two receiving arguments are pointing at the second calling argument. The pointers are pointing to the same memory, and in the case of farg1, the value of farg1%mem has taken the value &arg1->size.
I just tried this in Intel 2019.beta and the problem is still there.
Ugh. As a general rule of thumb in my experience, "seems like a compiler bug" usually means "I'm depending on undefined behavior being consistent"...
...but given that the gfortran compiler actually had an acknowledged bug there that we found and fixed, you could be right.
But looking again, are you sure that at the breakpoint you're using, the variables have been initialized? It looks like they both might be filled with bogus values to me.
I'll be back in the office on Tuesday; perhaps we could discuss then?