Microsoft-MPI icon indicating copy to clipboard operation
Microsoft-MPI copied to clipboard

[PGI 18.10/19.4/19.10] Fortran Compiler Compatibility?

Open cenit opened this issue 5 years ago • 11 comments

First of all, thanks a lot for making msmpi Open Source. I think this will drive the project much farther in adoption and faster than ever, together with the renewed developer friendliness Microsoft is so strongly adopting in many projects: starting from the huge work on MSVC, continuing with the vcpkg project, finally using MS toolchain is possible also in scientific HPC computing! I'd really like to use the new 10.0 version in a project I am supervising, ALaDyn, which is a particle-in-cell code mainly written in fortran and which uses MPI to scale on thousands of cores. Actually the Windows codebase is only used for debug, never in production, but things may change really soon and we would like to be ready for that moment. Setting apart cygwin, which I'd not consider a native windows build, we support building the code on Windows using MSMPI and the PGI 18.10 Fortran Compiler (community edition is enough), together with Visual Studio 2017 and vcpkg. We never tested Intel Compiler on windows, nor IntelMPI, for now (we may have to do it even faster than predicted, if we will be stuck on any problem with the current toolchain). The problem I am facing is that it is impossible to build the MPI part using this "new" code. In fact, PGI compiler includes an older version of MSMPI (HPC 2012, v4.1), which works perfectly and which is the only MPI distribution we support on Windows. If I use the freshly released 10.0 version (or even the slightly older 9.1 version) the compiler throws away internal errors and refuses to continue. May I ask you why? For now, we couldn't find any possible solution. Shall I ask to PGI? But since the compiler (18.10 is their latest version, fully compatible with VS2017) is working on an older version of MSMPI, do you consider this a regression on your side? Which Fortran compilers are actually supported by MSMPI?

Thanks a lot! I am also eager to help this project in any way that would be possible for me!

cenit avatar Jan 09 '19 10:01 cenit

@cenit, glad to hear that you like open source release of MSMPI!

Thanks for reporting this issue. Do you mind posting the errors that you see with PGI compiler?

jithinjosepkl avatar Jan 09 '19 17:01 jithinjosepkl

If not the PGI compiler, perhaps building with Flang might solve compatibility issues.

MathiasMagnus avatar Jan 14 '19 16:01 MathiasMagnus

@jithinjosepkl sorry for the delay. I think that it's best seen in the appveyor CI of ALaDyn, here: https://ci.appveyor.com/project/cenit/aladyn-kul79. We test the native build under windows with 3 configurations:

  • VS2017 + PGI 18.10 + MS-MPI 2012 (officially supported by PGI)
  • VS2017 + PGI 18.10 + MS-MPI 2012 R2 (officially supported by PGI)
  • VS2017 + PGI 18.10 + latest MS-MPI (from vcpkg)

As you can see, if using MSMPI_PGI (MS-MPI part of HPC Pack 2012 installed by PGI Compiler) everything is fine, using MSMPI_HPC2012 (MS-MPI part of HPC Pack 2012 R2, manually installed from Microsoft website) again it works. But if I use MSMPI_VCPKG (version 9 or even the latest 10, no differences at all) the compiler crashes. Inspecting the NMake output in more detail gives no big hints, just an "exploded" return value as an error message... "parallel.F90" is the first file using the mpi module.

Please let me know how to inspect it even more. In any case, you can use everything from that repo to analyse the problem, in case you think it is interesting (the CI recipe should give you all necessary details about the different building configurations).

cenit avatar Jan 16 '19 07:01 cenit

In the latest commit I added also some debug messages to help you, but unfortunately it is not so helpful:

[00:10:02] [ 55%] Building Fortran object CMakeFiles/ALaDyn.dir/src/parallel.F90.obj
[00:10:04] cmake : NMAKE : fatal error U1077: 'C:\PROGRA~1\PGI\win64\18.10\bin\pgf95.exe' : return code '0x2'
[00:10:04] At line:1 char:84
[00:10:04] + ... RSION -eq "MSMPI_VCPKG")   { cmake --build . --target install ; if ($ ...
[00:10:04] +                                  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[00:10:04]     + CategoryInfo          : NotSpecified: (NMAKE : fatal e...turn code '0x2':String) [], RemoteException
[00:10:04]     + FullyQualifiedErrorId : NativeCommandError
[00:10:04]  
[00:10:04] Stop.NMAKE : fatal error U1077: '"C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.16.27023\bin\HostX64\x64\nmake.exe"' : return code '0x2'
[00:10:04] Stop.
[00:10:04] NMAKE : fatal error U1077: '"C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.16.27023\bin\HostX64\x64\nmake.exe"' : return code '0x2'
[00:10:04] Stop.
[00:10:04] 

Re-run last line in verbose mode

[00:10:04] if [%COMPILER%]==[pgi]    if [%INSTALLED_MPI_VERSION%]==[MSMPI_VCPKG]     pgf95.exe -DUSE_MPI_MODULE -IC:\Tools\vcpkg\installed\%VCPKG_DEFAULT_TRIPLET%\include -# -Bdynamic -fast -O3 -r8 -c %WORKSPACE%\aladyn\src\parallel.F90 -o CMakeFiles\ALaDyn.dir\src\parallel.F90.obj
[00:10:05]   0 inform,   0 warnings,   0 severes, 0 fatal for parallel
[00:10:05] PGF90-I-0035-Predefined intrinsic nchar loses intrinsic property (C:\projects\aladyn\src\parallel.F90: 245)
[00:10:05] PGF90-I-0035-Predefined intrinsic nchar loses intrinsic property (C:\projects\aladyn\src\parallel.F90: 265)
[00:10:05] PGF90-I-0035-Predefined intrinsic nchar loses intrinsic property (C:\projects\aladyn\src\parallel.F90: 287)
[00:10:05] PGF90-I-0035-Predefined intrinsic nchar loses intrinsic property (C:\projects\aladyn\src\parallel.F90: 311)
[00:10:05] PGF90-I-0035-Predefined intrinsic nchar loses intrinsic property (C:\projects\aladyn\src\parallel.F90: 335)
[00:10:05] PGF90-I-0035-Predefined intrinsic nchar loses intrinsic property (C:\projects\aladyn\src\parallel.F90: 357)
[00:10:05] PGF90-I-0035-Predefined intrinsic nchar loses intrinsic property (C:\projects\aladyn\src\parallel.F90: 383)
[00:10:05] PGF90-I-0035-Predefined intrinsic nchar loses intrinsic property (C:\projects\aladyn\src\parallel.F90: 409)
[00:10:05] PGF90-I-0035-Predefined intrinsic nchar loses intrinsic property (C:\projects\aladyn\src\parallel.F90: 437)
[00:10:05]   0 inform,   0 warnings,   0 severes, 0 fatal for check_decomposition
[00:10:05]   0 inform,   0 warnings,   0 severes, 0 fatal for start_parallel
[00:10:05]   0 inform,   0 warnings,   0 severes, 0 fatal for mpi_valloc
[00:10:05]   0 inform,   0 warnings,   0 severes, 0 fatal for mpi_write_dp
[00:10:05]   0 inform,   0 warnings,   0 severes, 0 fatal for mpi_write_row_dp
[00:10:05]   0 inform,   0 warnings,   0 severes, 0 fatal for mpi_write_col_dp
[00:10:05]   0 inform,   0 warnings,   0 severes, 0 fatal for mpi_read_col_dp
[00:10:05]   0 inform,   0 warnings,   0 severes, 0 fatal for mpi_read_dp
[00:10:05]   0 inform,   0 warnings,   0 severes, 0 fatal for mpi_write_part
[00:10:05]   0 inform,   0 warnings,   0 severes, 0 fatal for mpi_write_part_col
[00:10:05]   0 inform,   0 warnings,   0 severes, 0 fatal for mpi_write_field
[00:10:05]   0 inform,   0 warnings,   0 severes, 0 fatal for mpi_write_field_col
[00:10:05]   0 inform,   0 warnings,   0 severes, 0 fatal for mpi_xinv_data_alloc
[00:10:05]   0 inform,   0 warnings,   0 severes, 0 fatal for mpi_ftw_alloc
[00:10:05]   0 inform,   0 warnings,   0 severes, 0 fatal for mpi_ftw_dalloc
[00:10:05]   0 inform,   0 warnings,   0 severes, 0 fatal for end_parallel
[00:10:05]   0 inform,   0 warnings,   0 severes, 0 fatal for exchange_idata
[00:10:05]   0 inform,   0 warnings,   0 severes, 0 fatal for exchange_3d_sp_data
[00:10:05]   0 inform,   0 warnings,   0 severes, 0 fatal for exchange_1d_grdata
[00:10:05]   0 inform,   0 warnings,   0 severes, 0 fatal for exchange_2d_grdata
[00:10:05]   0 inform,   0 warnings,   0 severes, 0 fatal for exchange_grdata
[00:10:05]   0 inform,   0 warnings,   0 severes, 0 fatal for realvec_distribute
[00:10:05]   0 inform,   0 warnings,   0 severes, 0 fatal for intvec_distribute
[00:10:05]   0 inform,   0 warnings,   0 severes, 0 fatal for intvec_row_distribute
[00:10:05]   0 inform,   0 warnings,   0 severes, 0 fatal for sr_idata
[00:10:05]   0 inform,   0 warnings,   0 severes, 0 fatal for sr_pdata
[00:10:05]   0 inform,   0 warnings,   0 severes, 0 fatal for sr_vidata
[00:10:05]   0 inform,   0 warnings,   0 severes, 0 fatal for exchange_pdata
[00:10:05]   0 inform,   0 warnings,   0 severes, 0 fatal for exchange_rdata
[00:10:05]   0 inform,   0 warnings,   0 severes, 0 fatal for vint_bcast
[00:10:05]   0 inform,   0 warnings,   0 severes, 0 fatal for int_bcast
[00:10:05]   0 inform,   0 warnings,   0 severes, 0 fatal for all_gather_dpreal
[00:10:05]   0 inform,   0 warnings,   0 severes, 0 fatal for allreduce_dpreal
[00:10:05]   0 inform,   0 warnings,   0 severes, 0 fatal for allreduce_big_int
[00:10:05]   0 inform,   0 warnings,   0 severes, 0 fatal for allreduce_sint
[00:10:05]   0 inform,   0 warnings,   0 severes, 0 fatal for allreduce_vint
[00:10:05]   0 inform,   0 warnings,   0 severes, 0 fatal for bcast_grdata
[00:10:05]   0 inform,   0 warnings,   0 severes, 0 fatal for bcast_realv_sum
[00:10:05]   0 inform,   0 warnings,   0 severes, 0 fatal for bcast_int_sum
[00:10:05]   0 inform,   0 warnings,   0 severes, 0 fatal for real_bcast
[00:10:05]   0 inform,   0 warnings,   0 severes, 0 fatal for local_to_global_grdata
[00:10:05]   0 inform,   0 warnings,   0 severes, 0 fatal for exchange_bdx_data
[00:10:05]   0 inform,   0 warnings,   0 severes, 0 fatal for swap_yx_3data
[00:10:05]   0 inform,   0 warnings,   0 severes, 0 fatal for swap_xy_3data
[00:10:05]   0 inform,   0 warnings,   0 severes, 0 fatal for swap_xz_3data
[00:10:05]   0 inform,   0 warnings,   0 severes, 0 fatal for swap_yx_3data_inv
[00:10:05]   0 inform,   0 warnings,   0 severes, 0 fatal for swap_xy_3data_inv
[00:10:05]   0 inform,   0 warnings,   0 severes, 0 fatal for swap_xz_3data_inv
[00:10:05]   0 inform,   0 warnings,   0 severes, 0 fatal for processor_grid_diag
[00:10:05]   0 inform,   0 warnings,   0 severes, 0 fatal for pftw2d_sc
[00:10:05]   0 inform,   0 warnings,   0 severes, 0 fatal for pftw3d_sc
[00:10:05]   0 inform,   0 warnings,   0 severes, 0 fatal for pftw2d
[00:10:05]   0 inform,   0 warnings,   0 severes, 0 fatal for pftw3d
[00:10:05] PGF90/x86-64 Windows 18.10-1: compilation completed with informational messages
[00:10:05] Export PGI_CURR_CUDA_HOME=C:\Program Files\PGI/win64/2018/cuda/9.1
[00:10:05] 
[00:10:05] "C:\Program Files\PGI/win64/18.10/bin\pgf901.exe" C:\projects\aladyn\src\parallel.F90 -opt 3 -nohpf -nostatic -quad -x 15 2 -x 49 0x400004 -x 51 0x20 -x 57 0x4c -x 58 0x10000 -x 124 0x1000 -x 120 0x80000000 -x 59 4 -x 124 0x400 -x 19 0x400000 -x 119 0x8800000 -tp haswell -x 57 0x7b0000 -x 58 0x78031040 -x 119 0x610400 -x 70 0x6c00 -x 47 0x400000 -x 47 0x08 -x 120 0x80000000 -x 59 4 -x 15 2 -x 49 0x100 -x 48 5376 -stdinc "C:\Program Files\PGI/win64/18.10/include/wrap;C:\Program Files\PGI/win64/18.10/include/;C:/Program Files (x86)/Microsoft Visual Studio/2017/Community/VC/Tools/MSVC/14.16.27023/include/sys;C:/Program Files (x86)/Microsoft Visual Studio/2017/Community/VC/Tools/MSVC/14.16.27023/include;C:/Program Files (x86)/Windows Kits/10/Include/10.0.17763.0/shared;C:/Program Files (x86)/Windows Kits/10/Include/10.0.17763.0/ucrt;C:/Program Files (x86)/Windows Kits/10/Include/10.0.17763.0/um" -cmdline "+pgf95 C:\projects\aladyn\src\parallel.F90 -DUSE_MPI_MODULE -IC:\Tools\vcpkg\installed\x64-windows\include -# -Bdynamic -fast -Mvect=sse -Mcache_align -Mflushz -Mpre -O3 -Mvect=sse -Mcache_align -Mpre -r8 -c -o CMakeFiles\ALaDyn.dir\src\parallel.F90.obj" -def _M_AMD64 -def _MT -def _WIN32 -def __WIN32 -def __WIN32__ -def _WIN64 -def __WIN64 -def __WIN64__ -def __x86_64__ -def __X86_64__ -def __unaligned= -def _INTEGRAL_MAX_BITS=64 -def __extension__= -def __amd64__ -def _AMD64_ -def _M_X64 -def __SSE__ -def __MMX__ -def __SSE2__ -def __SSE3__ -def __SSSE3__ -def __PGI_TOOLS17 -idir C:\Tools\vcpkg\installed\x64-windows\include -def USE_MPI_MODULE -def _DLL -preprocess -freeform -vect 48 -x 54 1 -x 70 0x40000000 -y 163 0xc0000000 -x 189 0x10 -x 53 2 -quad -x 119 0x10000000 -x 53 2 -quad -x 119 0x10000000 -x 124 0x8 -x 124 0x80000 -modexport C:\Users\appveyor\AppData\Local\Temp\1\pgf954c9mpb3PC3OQxX.cmod -modindex C:\Users\appveyor\AppData\Local\Temp\1\pgf955dLmpbV4HL28hj.cmdx -output C:\Users\appveyor\AppData\Local\Temp\1\pgf952aTmpbhrxnwKNm.ilm
[00:10:05] 
[00:10:05] "C:\Program Files\PGI/win64/18.10/bin\pgf902.exe" C:\Users\appveyor\AppData\Local\Temp\1\pgf952aTmpbhrxnwKNm.ilm -fn C:\projects\aladyn\src\parallel.F90 -opt 3 -x 51 0x20 -x 120 0x80000000 -x 59 4 -x 19 0x400000 -x 28 0x40000 -x 119 0x4a10400 -x 122 0x40 -x 123 0x1000 -x 127 0x15 -x 129 0x10 -quad -y 80 0x1000 -x 80 0x10800000 -tp haswell -vect 56 -y 34 16 -x 34 0x8 -x 32 36700160 -y 19 8 -y 35 0 -x 42 0x30 -x 39 0x40 -x 199 10 -x 39 0x80 -x 70 0x8000 -x 122 1 -x 125 0x20000 -x 120 0x1000 -x 124 0x400 -x 119 0x400000 -x 120 0x80 -y 15 2 -x 57 0x3b0000 -x 58 0x48000000 -x 15 2 -x 49 0x100 -astype 0 -x 121 1 -x 46 4 -x 68 0x20 -x 70 0x40000000 -x 124 1 -y 163 0xc0000000 -x 189 0x10 -y 189 0x4000000 -x 9 1 -x 42 0x14200000 -x 72 0x1 -x 136 0x11 -quad -x 119 0x10000000 -x 129 0x40000000 -x 129 2 -x 164 0x1000 -x 9 1 -x 72 0x1 -x 136 0x11 -quad -x 119 0x10000000 -x 129 0x40000000 -x 164 0x1000 -cmdline "+pgf95 C:\projects\aladyn\src\parallel.F90 -DUSE_MPI_MODULE -IC:\Tools\vcpkg\installed\x64-windows\include -# -Bdynamic -fast -Mvect=sse -Mcache_align -Mflushz -Mpre -O3 -Mvect=sse -Mcache_align -Mpre -r8 -c -o CMakeFiles\ALaDyn.dir\src\parallel.F90.obj" -asm C:\Users\appveyor\AppData\Local\Temp\1\pgf956enmpbN_qy26Zh.s
[00:10:05] pgf95-Fatal-f902 completed with exit code -1073741819
[00:10:05] 
[00:10:05] Unlinking C:\Users\appveyor\AppData\Local\Temp\1\pgf952aTmpbhrxnwKNm.ilm
[00:10:05] Unlinking C:\Users\appveyor\AppData\Local\Temp\1\pgf953bvmpb-zXIcBw9.stb
[00:10:05] Unlinking C:\Users\appveyor\AppData\Local\Temp\1\pgf954c9mpb3PC3OQxX.cmod
[00:10:05] Unlinking C:\Users\appveyor\AppData\Local\Temp\1\pgf955dLmpbV4HL28hj.cmdx
[00:10:05] Unlinking C:\Users\appveyor\AppData\Local\Temp\1\pgf956enmpbN_qy26Zh.s
[00:10:05] Unlinking C:\Users\appveyor\AppData\Local\Temp\1\pgf957f1mpbFZ4umBld.ll
[00:10:05] Command exited with code 2

pgf95-Fatal-f902 completed with exit code -1073741819

I am unable to understand more what is going on. I doubt that we have a problem on our side, since we perfectly work with OpenMPI, IntelMPI, MPICH and also MS-MPI (older releases). This combination of PGI+MS-MPI is problematic, and as you can see not in the linking phase but in the building one. Maybe the module mpi.mod, generated by mpi.f90, contains some directives which changed recently and make the compiler go crazy?

cenit avatar Jan 17 '19 14:01 cenit

Thanks @cenit and @MathiasMagnus for the additional details. We will look into this.

jithinjosepkl avatar Jan 18 '19 17:01 jithinjosepkl

thank you so much. Please let me know if I can help in any way

cenit avatar Jan 20 '19 19:01 cenit

is it related to this issue maybe?

cenit avatar Jan 30 '19 12:01 cenit

Nope, that definition has been there for since the beginning of the project.

I am suspecting it is a SAL directive, but didn't get a chance to verify it with PGI yet. If its possible, can you please try with this fix? Also, do you see the crash with MSMPI_NO_SAL defined?

jithinjosepkl avatar Jan 30 '19 17:01 jithinjosepkl

I forked the project and applied your patch: https://github.com/cenit/Microsoft-MPI I have some problem building Microsoft-MPI: https://ci.appveyor.com/project/cenit/microsoft-mpi Whenever I will be able to build it, I will add some instructions to the CI recipe in order to try the resulting MPI library with PGI compiler

cenit avatar Jan 30 '19 21:01 cenit

Can you please confirm that VCToolsVersion/WindowsTargetPlatformVersion are correct in Directory.Build.props? Also, can you try x64 build ?

Btw, an easier way might be to just patch mpi.h in the installation, and try building ALaDyn with PGI.

jithinjosepkl avatar Jan 30 '19 22:01 jithinjosepkl

Sorry for resurrecting the thread only now. I tried anyway to build ALaDyn with just the patch to mpi.h applied to the installed sdk, https://github.com/cenit/Microsoft-MPI/commit/0358fb00d603ae59dc61a623bf2bae59b54c2a23.patch

It didn't work.

Also, I can confirm that the same identical issue persists also with PGI 19.10 (latest version)

cenit avatar Jan 15 '20 23:01 cenit