FTorch icon indicating copy to clipboard operation
FTorch copied to clipboard

Add intel ifx and ifort build to CI

Open TomMelt opened this issue 1 year ago • 1 comments
trafficstars

Probably make a docker image with intel compilers already installed

TomMelt avatar Jun 24 '24 10:06 TomMelt

You may be able to use an existing actions e.g. https://github.com/fortran-lang/setup-fortran, which seems to support ifx and ifort (see also: discussion)

ElliottKasoar avatar Aug 03 '24 22:08 ElliottKasoar

@jatkinson1000 This issue should not yet be closed despite the CI now having a working intel build as of PR https://github.com/Cambridge-ICCS/FTorch/pull/438. It turns out we could circumvent the building of OpenMPI from source with Intel compilers in the CI

https://github.com/Cambridge-ICCS/FTorch/blob/7a277f9237cc79c5bdbeb1376771503b3baf8264/.github/workflows/test_suite_ubuntu.yml#L234

and instead just use the Intel MPI library (which gets installed as part of setup-fortran with intel classic). The MPI integration tests should run without a problem if the mpi_gather routine is deleted from

https://github.com/Cambridge-ICCS/FTorch/blob/7a277f9237cc79c5bdbeb1376771503b3baf8264/examples/7_MPI/mpi_infer_fortran.f90#L15

I'm really not sure why the use mpi, only: ... does not get the expected routines into the program scope, but I have asked a question to the intel community here in case you're interested (or perhaps you are already familiar with why this is):

See Intel Community Question

I'm obviously biased here, but I think the effort on building the OpenMPI with Intel compilers is not necessarily a bad thing and may still be prudent to keep it/move it into a docker dev container to be used in the CI at a later point (see also PR https://github.com/Cambridge-ICCS/FTorch/pull/271) since at least on DKRZ's Levante we use OpenMPI built with Intel compilers. I would think other groups have done something similar. Of course, I do understand if the work I did get scrapped in favor of just running CI with Intel MPI :))

jfdev001 avatar Oct 09 '25 09:10 jfdev001

👀

Interesting. In terms of code I'd prefer not to remove the declaration from the only if possible. Do keep us updated on what comes from the intel report.

As you see, the current CI is not 'broken' so we can wait to see what intel say.

jatkinson1000 avatar Oct 09 '25 09:10 jatkinson1000

Findings

What follows are comments that apply only to the Intel MPI implementation since it implements the MPI-3.1. specification (see Intel MPI Library (version: 2021.14): Get Started Guide for Linux OS).

The module file mpi.mod (note: default install location with root privileges is /opt/intel/oneapi/mpi/latest/include/mpi) defines explicit interfaces only for a subset of all MPI routines. MPI routines (e.g., MPI_Gather, MPI_Send) with a choice buffer (see MPI-3.1 standard: Fortran Support Through the mpi Module) do not have explicit interfaces because the first MPI versions used implicit interfaces. To maintain backwards compatibility, explicit interfaces are only available through the mpi_f08 module since the Fortran 2008 standard introduced TYPE(*) and DIMENSION(..) (see MPI-4.1 standard: MPI for Different Fortran Standard Versions) that facilitate the use of choice buffers.

What this means practically is that while you can use routines like mpi_gather, you cannot include them in the only: [routines...] argument list. So, no compile time error checking is available for such routines.

Conclusions and Proposed Action

Including a CI job for Intel MPI is trivial but requires one of the following mutually exclusive changes:

  1. Any MPI integration tests should use the module mpi_f08, not mpi. The use of mpi_f08 is strongly recommended by the MPI-3.1 specification anyways.
  2. Any MPI integration tests that use the module mpi must remove any MPI routines that have a choice buffer from the only: [routines...] arguments. This includes routines such as MPI_Send, MPI_Gather, and others. See the attached list of routines (generated by a script I describe here) for a full list of routines that cannot be included in the only arguments list for Intel MPI.

mpi_routines_with_choice_buffer.txt

~Removing the OpenMPI built with Intel compilers job is maybe not the wisest since it seems this setup is common on HPC systems. If we keep the CI job, end users can be confident about the stability of integrating FTorch into their MPI programs.~

~If you would like, I can open a new issue for the Intel MPI CI since i believe the current issue is resolved (i.e., we have a working Intel CI job):~ ~- [ ] Open new issue for Intel MPI CI?~

I cannot work on that issue at the moment since I'm more focused on spack.

Reference Questions

  • https://fortran-lang.discourse.group/t/all-mpi-routines-available-even-when-not-explicitly-included-in-the-only-args-list/10443
  • https://community.intel.com/t5/Intel-MPI-Library/Error-Symbol-mpi-gather-referenced-at-1-not-found-in-module-mpi/m-p/1721240/highlight/true#M12229

jfdev001 avatar Oct 16 '25 07:10 jfdev001

I have not had time to do as deep a dive as you on all of these references (thank you!) but my initiall feeling is that we move to use mpi_f08.

FTorch requires an f08 standard compiler so I don't feel it would be unreasonable to assume mpi_f08 as well.

The only instance I see this breaking would be if a user is trying to build and link against an old MPI distribution that does not provide the mpi_f08 module, or a niche one that chooses not to. Do you have a feeling for how likely this is? I presume OpenMPI, Intel MPI, MPICH etc. all support this and have for a while?

I'm leaning towards a solution being update to use mpi_f08 and add a note to the docs to say "if you encounter error message 'X' switch to use mpi instead". In reality our users will be adding the use mpi... statements to their code themselves (or using what is already in place in a larger codebase) so I don;t think this change is likely to cause too many issues.

jatkinson1000 avatar Oct 16 '25 07:10 jatkinson1000

Responses

I'm leaning towards a solution being update to use mpi_f08 and add a note to the docs to say "if you encounter error message 'X' switch to use mpi instead". In reality our users will be adding the use mpi... statements to their code themselves (or using what is already in place in a larger codebase) so I don;t think this change is likely to cause too many issues.

I think this is a really important point. FTorch doesn't do anything special with MPI as far as I can tell, it just includes MPI in integration tests.

The only instance I see this breaking would be if a user is trying to build and link against an old MPI distribution that does not provide the mpi_f08 module, or a niche one that chooses not to. Do you have a feeling for how likely this is? I presume OpenMPI, Intel MPI, MPICH etc. all support this and have for a while?

This would only break something for the end user if they attempted to build and run FTorch's test suite. If they don't care about running the tests, they would not encounter any issues.

Note that mpi_f08 is supported by OpenMPI v2 and MPICH v3.1---though both of these MPI implementations should not really be used anyways since they're very old. Intel doesn't let you download really old oneAPI implementations, so I can't verify at what point they begin supporting mpi_f08. At the very least, Intel 2021.x MPI libs definitely supports mpi_f08. I think, therefore, it is very unlikely someone would run across an issue with use mpi_f08 should they wish to build the FTorch test suite.

Proposed Action

Given this discussion, I think it is actually reasonable to remove the OpenMPI build with Intel compilers, and replace it with just using the Intel MPI that ships with oneAPI. This would definitely save on the CI runtime. The structure of the CI file will remain the same---that is, the GNU and Intel jobs will still have to be separate. Here is a list of items that should be done to reflect such a change:

  • [x] Update .github/workflows/test_suite_ubuntu.yaml to replace the OpenMPI+Intel build with just Intel MPI that ships with oneAPI.
  • [x] Update docs mentioning potential errors that might be encountered if attempting to build the tests with old OpenMPI, MPICH, and/or Intel MPI versions.
  • [x] update changelog to reflect that intel MPI CI now uses intel MPI rather than OpenMPI built with Intel compilers

I could do this next week (2025-10-20) unless someone wants to do it earlier than that, just lmk :))

jfdev001 avatar Oct 16 '25 09:10 jfdev001