ompi icon indicating copy to clipboard operation
ompi copied to clipboard

NVIDIA 'nvfortran' cannot link libmpi_usempif08.la

Open jsquyres opened this issue 4 years ago • 28 comments
trafficstars

As reported on https://www.mail-archive.com/[email protected]/msg21283.html, Paul Kapinos is unable to build 4.0.x or 4.1.x with the NVIDIA nvfortran compiler (from https://developer.nvidia.com/hpc-compilers).

 FCLD     libmpi_usempif08.la
/usr/bin/ld: .libs/comm_spawn_multiple_f08.o: relocation R_X86_64_32S against `.rodata' can not be used when making a shared object; recompile with -fPIC
/usr/bin/ld: .libs/startall_f08.o: relocation R_X86_64_32S against `.rodata' can not be used when making a shared object; recompile with -fPIC
/usr/bin/ld: .libs/testall_f08.o: relocation R_X86_64_32S against `.rodata' can not be used when making a shared object; recompile with -fPIC
/usr/bin/ld: .libs/testany_f08.o: relocation R_X86_64_32S against `.rodata' can not be used when making a shared object; recompile with -fPIC
/usr/bin/ld: .libs/testsome_f08.o: relocation R_X86_64_32S against `.rodata' can not be used when making a shared object; recompile with -fPIC
/usr/bin/ld: .libs/type_create_struct_f08.o: relocation R_X86_64_32S against `.rodata' can not be used when making a shared object; recompile with -fPIC
/usr/bin/ld: .libs/type_get_contents_f08.o: relocation R_X86_64_32S against `.rodata' can not be used when making a shared object; recompile with -fPIC
/usr/bin/ld: .libs/waitall_f08.o: relocation R_X86_64_32S against `.rodata' can not be used when making a shared object; recompile with -fPIC
/usr/bin/ld: .libs/waitany_f08.o: relocation R_X86_64_32S against `.rodata' can not be used when making a shared object; recompile with -fPIC
/usr/bin/ld: .libs/waitsome_f08.o: relocation R_X86_64_32S against `.rodata' can not be used when making a shared object; recompile with -fPIC
/usr/bin/ld: profile/.libs/pcomm_spawn_multiple_f08.o: relocation R_X86_64_32S against `.rodata' can not be used when making a shared object; recompile with -fPIC
/usr/bin/ld: profile/.libs/pstartall_f08.o: relocation R_X86_64_32S against `.rodata' can not be used when making a shared object; recompile with -fPIC
/usr/bin/ld: profile/.libs/ptestall_f08.o: relocation R_X86_64_32S against `.rodata' can not be used when making a shared object; recompile with -fPIC
/usr/bin/ld: profile/.libs/ptestany_f08.o: relocation R_X86_64_32S against `.rodata' can not be used when making a shared object; recompile with -fPIC
/usr/bin/ld: profile/.libs/ptestsome_f08.o: relocation R_X86_64_32S against `.rodata' can not be used when making a shared object; recompile with -fPIC
/usr/bin/ld: profile/.libs/ptype_create_struct_f08.o: relocation R_X86_64_32S against `.rodata' can not be used when making a shared object; recompile with -fPIC
/usr/bin/ld: profile/.libs/ptype_get_contents_f08.o: relocation R_X86_64_32S against `.rodata' can not be used when making a shared object; recompile with -fPIC
/usr/bin/ld: profile/.libs/pwaitall_f08.o: relocation R_X86_64_32S against `.rodata' can not be used when making a shared object; recompile with -fPIC
/usr/bin/ld: profile/.libs/pwaitany_f08.o: relocation R_X86_64_32S against `.rodata' can not be used when making a shared object; recompile with -fPIC
/usr/bin/ld: profile/.libs/pwaitsome_f08.o: relocation R_X86_64_32S against `.rodata' can not be used when making a shared object; recompile with -fPIC
/usr/bin/ld: .libs/abort_f08.o: relocation R_X86_64_PC32 against symbol `ompi_abort_f' can not be used when making a shared object; recompile with -fPIC
/usr/bin/ld: final link failed: Bad value

jsquyres avatar May 04 '21 14:05 jsquyres

👍

janjust avatar Jun 22 '21 15:06 janjust

Can someone from nVidia please take a look?

gpaulsen avatar Sep 24 '21 19:09 gpaulsen

@janjust Is anyone at NVIDIA looking at this? It looks like the same issue was just reported on https://www.mail-archive.com/[email protected]/msg34594.html.

jsquyres avatar Sep 29 '21 14:09 jsquyres

~~FYI, autoconf issue is on our deliverables for 21.11 due end of November.~~

Not a compiler bug but rather autoconf, we'll make a note of it in the README until we fix it.

janjust avatar Oct 11 '21 18:10 janjust

@janjust I think you're saying that NVIDIA is going to deliver a new version of your compiler (v21.11) in end of November that will fix the issue. Is that correct?

jsquyres avatar Oct 11 '21 20:10 jsquyres

@jsquyres correct

janjust avatar Oct 11 '21 20:10 janjust

@janjust Great. Can you make PR's for v4.0.x / v4.1.x README's that mention this? Seems like a good issue discussion in the section with other compiler issues.

jsquyres avatar Oct 11 '21 20:10 jsquyres

@jsquyres question: since this is only reported for v4.0.x/v4.1.x, does the readme update go into master and cherry-pick back, or just against the two branches?

janjust avatar Oct 12 '21 15:10 janjust

Does it also exist in master/v5.0 (regardless of what is reported)?

jsquyres avatar Oct 12 '21 15:10 jsquyres

we do need a fix for this on at least 4.1.x branch.

hppritcha avatar Oct 12 '21 17:10 hppritcha

@jsquyres same on v5.0 by extension master, I'll open up master and cherry-pick to all branches

janjust avatar Oct 12 '21 19:10 janjust

nvfortran supports the -fPIC flag. Is there some reason that Open MPI is not passing -fPIC to nvfortran?

Maybe I'm not fully understanding the issue here, but a workaround I have commonly used for this is to just explicitly pass -fPIC to nvfortran, e.g. FC="nvfortran -fPIC".

Seems like possibly the GNU autoconf stuff is not enabling -fPIC for Fortran in the generated libtool wrapper in the case of nvfortran, but I don't have enough familiarity with this bit to know exactly what to fix here.

cparrott73 avatar Oct 14 '21 21:10 cparrott73

Ah, if it's just a missing -fPIC, then I'd assume that it's Libtool that is not automatically recognizing nvfortran and automatically adding -fPIC to FCFLAGS. This is likely because Libtool hasn't had a release in a long time, and therefore may not have the bits to recognize nvfortran.

My suggestion is to two do things:

  1. Submit a patch upstream to the Libtool project. This is likely the best long-term solution.
  2. If the patch is suitable for Open MPI, apply it there, too.
    • But I suspect it won't be suitable -- it's tricky to add patches after libtool has run.
    • It may be easier to add some logic to OMPI's configure (e.g., in OMPI_SETUP_FC) to:
      1. identify if the compiler is nvfortran
      2. check to see if we're building Fortran shared libraries
      3. If both of the above are true, then ensure -fPIC is in $FCFLAGS

If you have configure add -fPIC to FCFLAGS, please use some kind of AC_MSG_* to emit to stdout that you have done this. E.g.:

checking if compiler is nvfortran... yes
checking if building Fortran shared libraries... yes
checking to see if -fPIC is in FCFLAGS... no (added)

Or something like that. It's just helpful to emit this kind of stuff so that it's in the configure logs in case we have to post-mortem diagnose things.

jsquyres avatar Oct 15 '21 14:10 jsquyres

Thanks, Jeff. I'll try opening an issue with libtool and see if that bears any fruit.

cparrott73 avatar Oct 15 '21 21:10 cparrott73

@jsquyres - upon further investigation, it appears that GNU libtool is currently unmaintained, and there have been no new releases since 2015. I'm not sure there is anyone available there to engage on this. Perhaps at the least we could send you some patches to make sure -fPIC gets passed to nvfortran, but that's probably about the best we can do here at this point.

cparrott73 avatar Oct 19 '21 04:10 cparrott73

@cparrott73 Ugh. That's a bad position for open source projects (many C projects use Libtool).

Sure, PR's here would be great. See my "It may be easier to..." comment in https://github.com/open-mpi/ompi/issues/8919#issuecomment-944329586, above, for a suggestion on what to do.

jsquyres avatar Oct 19 '21 13:10 jsquyres

Guess that raises a question for OMPI v6 - do we need to consider changing the build system? We have heard about libtool before and this reinforces it - but we also know autoconf and friends are likewise unmaintained (save for the recent one-shot someone paid to have done, and caused us a bunch of cleanup). Should it be on the agenda for discussion at a developer meeting?

rhc54 avatar Oct 19 '21 14:10 rhc54

Shh!! Don't say such things publicly!! 🙊

Yes, I was also thinking we should probably have some discussions about this. There is a giant amount of inertia behind the GNU Autotools in all the Open MPI projects, though... it would take a lot of work to move away from them.

jsquyres avatar Oct 19 '21 14:10 jsquyres

Agreed - and we'd want to ping our downstream packagers about it before committing to anything as they would also be impacted. Not advocating a change, but wondering if our hands are going to be forced at some point.

rhc54 avatar Oct 19 '21 14:10 rhc54

@jsquyres - yeah, it's not ideal. I have some changes to GNU libtool to support NVIDIA HPC SDK compilers in the works. I need to test them. I will probably open a bug on their Savannah page and attach a patch, just in case some brave soul decides to step up and take over the project at some point. I'm also working on applying these changes to everywhere libtool is used within Open MPI, but as you noted, it's splashed around quite a few different spots within the project. Will take me a bit to find all of them.

Seems like there is growing momentum behind CMake and some other similar tools within the OSS community, although CMake certainly comes with its own set of issues. I concur that switching a massive project like Open MPI over to something like CMake will not be a trivial undertaking, to say the least.

cparrott73 avatar Oct 19 '21 19:10 cparrott73

The Fortran compiler is only used in one place, so the path I suggested may be simpler than trying to edit libtool patches to be applied after the fact.

FWIW: I do not think that we will be switching away from the Autotools any time soon. Since we only have unsupported compiler issues come up once in a great while, it would really be hard to justify all the work necessary to fully migrate away from the Autotools.

jsquyres avatar Oct 19 '21 21:10 jsquyres

Ah, that's a good point. I know there are other projects out there that use libtool, so I was aiming for a more general solution. But if you're good with just handling it in the small section of code that covers the Fortran bindings, then it's all good with me, too.

Completely understand about not shifting away from Autotools - it would be a major undertaking.

cparrott73 avatar Oct 20 '21 05:10 cparrott73

@lrbison IIRC we also reported an issue for the nv fortran compiler. Is this related?

wenduwan avatar Feb 22 '24 17:02 wenduwan

@wenduwan no, I don't believe so. I that issue was application code failing to compile with an ICE only when using Open MPI v5.0.x. The error was:

Lowering Error: bad ast optype in expression [ast=9198,asttype=19,datatype=0]
NVFORTRAN-F-0000-Internal compiler error. Errors in Lowering       1  (Common/evecs.f90: 1588)
NVFORTRAN/x86-64 Linux 23.9-0: compilation aborted

And reportedly it is related to https://github.com/open-mpi/ompi/issues/11582

lrbison avatar Feb 24 '24 05:02 lrbison

@lrbison can you update to a more recent version of the HPC SDK and try again? I am thinking this bug may have been resolved in more recent releases of our compilers, but I would have to check on it. Please try a newer version, e.g. 24.1, and report back. Thanks.

cparrott73 avatar Feb 26 '24 18:02 cparrott73

Just ran into this issue when trying to use nvfortran.

romxero avatar May 10 '24 23:05 romxero

What is the current status of this? I am facing the same problems with the latest version of nvhpc

/bin/sh ../libtool  --tag=CC   --mode=link mpicc  -fPIC -O3 -march=core-avx2 -fno-strict-aliasing -version-info 19:1:0 -L~/libs/lib -o libnetcdf.la -rpath ~/libs/lib libnetcdf_la-nc_initialize.lo ../libdispatch/libnetcdf2.la ../libdispatch/libdispatch.la ../libsrc/libnetcdf3.la ../libsrcp/libnetcdfp.la ../libhdf5/libnchdf5.la    ../libsrc4/libnetcdf4.la ../libnczarr/libnczarr.la  -lpnetcdf -lm -lhdf5_hl -lhdf5 -lz
libtool:   error: cannot find the library '/libmpi_usempif08.la' or unhandled argument '/libmpi_usempif08.la'

lcebaman avatar Jul 29 '24 15:07 lcebaman

@lcebaman I issued #12722 to address the issue that was initially reported.

That being said, since your app is trying to link with /libmpi_usempif08.la it is not obvious to me if your issue is related to this one.

ggouaillardet avatar Jul 30 '24 02:07 ggouaillardet