ompi icon indicating copy to clipboard operation
ompi copied to clipboard

ROMIO fails to build if level_zero is installed

Open acgoldma opened this issue 2 years ago • 5 comments

Thank you for taking the time to submit an issue!

Background information

What version of Open MPI are you using? (e.g., v3.0.5, v4.0.2, git branch name and hash, etc.)

https://github.com/open-mpi/ompi/pull/10224

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

I was attempting to build OMPI rpm

If you are building/installing from a git clone, please copy-n-paste the output from git submodule status.

https://github.com/open-mpi/ompi/pull/10224 openpmix @ 6692c28 prrte @ 7ae2c08

Please describe the system on which you are running

  • Operating system/version: RHEL 8.4
  • Computer hardware: Intel E5-2699 v3
  • Network type: N/A

Details of the problem

Looks like romio341 detects Level-Zero on my system and causes issues.

  CC       src/gavl/mpl_gavl.lo
../../../../../3rd-party/romio341/mpl/src/gpu/mpl_gpu_ze.c: In function 'MPL_gpu_ipc_handle_create':
../../../../../3rd-party/romio341/mpl/src/gpu/mpl_gpu_ze.c:123:11: warning: implicit declaration of function 'zeDriverGetMemIpcHandle'; did you mean 'zeMemGetIpcHandle'
? [-Wimplicit-function-declaration]
     ret = zeDriverGetMemIpcHandle(global_ze_driver_handle, ptr, ipc_handle);
           ^~~~~~~~~~~~~~~~~~~~~~~
           zeMemGetIpcHandle
../../../../../3rd-party/romio341/mpl/src/gpu/mpl_gpu_ze.c: In function 'MPL_gpu_ipc_handle_map':
../../../../../3rd-party/romio341/mpl/src/gpu/mpl_gpu_ze.c:139:9: warning: implicit declaration of function 'zeDriverOpenMemIpcHandle'; did you mean 'zeMemOpenIpcHandle
'? [-Wimplicit-function-declaration]
         zeDriverOpenMemIpcHandle(global_ze_driver_handle,
         ^~~~~~~~~~~~~~~~~~~~~~~~
         zeMemOpenIpcHandle
../../../../../3rd-party/romio341/mpl/src/gpu/mpl_gpu_ze.c:140:69: error: 'MPL_gpu_ipc_mem_handle_t' {aka 'struct _ze_ipc_mem_handle_t'} has no member named 'global_dev
_id'
                                  global_ze_devices_handle[ipc_handle.global_dev_id],
                                                                     ^
../../../../../3rd-party/romio341/mpl/src/gpu/mpl_gpu_ze.c:141:44: error: 'MPL_gpu_ipc_mem_handle_t' {aka 'struct _ze_ipc_mem_handle_t'} has no member named 'handle'
                                  ipc_handle.handle, ZE_IPC_MEMORY_FLAG_NONE, ptr);
                                            ^
../../../../../3rd-party/romio341/mpl/src/gpu/mpl_gpu_ze.c:141:53: error: 'ZE_IPC_MEMORY_FLAG_NONE' undeclared (first use in this function); did you mean 'ZE_IPC_MEMORY
_FLAG_TBD'?
                                  ipc_handle.handle, ZE_IPC_MEMORY_FLAG_NONE, ptr);
                                                     ^~~~~~~~~~~~~~~~~~~~~~~
                                                     ZE_IPC_MEMORY_FLAG_TBD

Looks like it is not using the correct API?

acgoldma avatar Apr 07 '22 16:04 acgoldma

@acgoldma Sorry for the 2-year wait. Does this problem still happen with latest romio and ompi 5.0.3?

wenduwan avatar Apr 18 '24 16:04 wenduwan

did not see an issue recently.

acgoldma avatar Apr 22 '24 16:04 acgoldma

Hello, I work with Adam, I think we need to reopen this issue.

I see the same build error when trying to build from the v5.0.3 source RPM.

sjb017 avatar May 22 '24 15:05 sjb017

https://github.com/open-mpi/ompi/blob/42c744e00eba2da1f904d2b94f33d2769e744867/3rd-party/romio341/mpl/configure.ac#L999

There appears to be no easy way to disable this code from configure.

Also, the ROMIO version is using a pre-release spec (v0.95) for level-zero, so that will need to be updated. Many things changes in the release spec (v1.0+).

acgoldma avatar May 22 '24 15:05 acgoldma

A quick fix is to run sed -i 's/have_ze=yes/have_ze=no/' /build/ompi/3rd-party/romio341/mpl/configure.ac to disable level-zero in romio completely.

juselius avatar Jun 25 '24 10:06 juselius