[urgent] HipBuildImpl() seems fail to create temp dir or due to no GPU presents in build env
Actual Result:
[2024-05-10T20:01:46.194Z] /src/hip/hip_build_utils.cpp:186:42: error: expected ')'
[2024-05-11T09:45:27.064Z] 186 | MIOPEN_THROW("Failed cmd: '" MIOPEN_HIP_COMPILER "', args: '" + args + '\'');
[2024-05-11T09:45:27.064Z] | ^
[2024-05-11T09:45:27.064Z] /build_hip/include/miopen/config.h:98:29: note: expanded from macro 'MIOPEN_HIP_COMPILER'
[2024-05-11T09:45:27.064Z] 98 | #define MIOPEN_HIP_COMPILER getHIPCompilerPath()
[2024-05-11T09:45:27.064Z] | ^
[2024-05-11T09:45:27.064Z] /src/hip/hip_build_utils.cpp:186:13: note: to match this '('
[2024-05-11T09:45:27.064Z] 186 | MIOPEN_THROW("Failed cmd: '" MIOPEN_HIP_COMPILER "', args: '" + args + '\'');
[2024-05-11T09:45:27.064Z] | ^
[2024-05-11T09:45:27.064Z] /src/include/miopen/errors.hpp:69:28: note: expanded from macro 'MIOPEN_THROW'
[2024-05-11T09:45:27.064Z] 69 | miopen::MIOpenThrow(__FILE__, __LINE__, __VA_ARGS__); \
It looks like that
if(!fs::exists(bin_file))
MIOPEN_THROW("Failed cmd: '" MIOPEN_HIP_COMPILER "', args: '" + args + '\'');
has failed.
@atamazov @apwojcik could you help to take a look? I know it is not easily reproducible and I am requesting an exact reproduce env now. Could you check statically what might be the potential issue? Is it permission to create temp dir?
@apwojcik @JehandadKhan @atamazov another theory, the line: https://github.com/ROCm/MIOpen/blob/b99493b9d8517312c84434961e6bb4621907ec59/src/hip/hip_build_utils.cpp#L186 gets mis-matched quote marks. It was not triggered in normal situations, but on a node where no GPU is presented, it triggered the MIOPEN_THROW and thus produces this error. aka this throw is not properly tested unfortunately.
How about make it:
MIOPEN_THROW("Failed cmd: '" + MIOPEN_HIP_COMPILER + "', args: '" + args + '\'');
@junliume
MIOPEN_THROW("Failed cmd: '" + MIOPEN_HIP_COMPILER + "', args: '" + args + '\'');
Almost like that, pls see https://github.com/ROCm/MIOpen/pull/2959#pullrequestreview-2051445141
Thanks @atamazov especially it's afterhours :)
I am still puzzled why this is happening only now, likely staging has been building MIOpen on nodes without GPU for a while, but we only starts to throw such issues recently. So maybe nogpu backend should be fixed somehow? :)
@junliume In the attached logfile I see: [2024-05-10T20:00:55.207Z] + cmake '-DCMAKE_PREFIX_PATH=/opt/rocm-6.2.0-488/llvm;/opt/rocm-6.2.0-488' '-DCMAKE_SHARED_LINKER_FLAGS_INIT=-Wl,--enable-new-dtags,--build-id=sha1,--rpath,$ORIGIN' '-DCMAKE_EXE_LINKER_FLAGS_INIT=-Wl,--enable-new-dtags,--build-id=sha1,--rpath,$ORIGIN/../lib' -DCMAKE_VERBOSE_MAKEFILE=1 -DCMAKE_BUILD_TYPE=RelWithDebInfo -DCMAKE_INSTALL_RPATH_USE_LINK_PATH=FALSE -DCMAKE_INSTALL_PREFIX=/opt/rocm-6.2.0-488 -DCMAKE_PACKAGING_INSTALL_PREFIX=/opt/rocm-6.2.0-488 -DBUILD_FILE_REORG_BACKWARD_COMPATIBILITY=OFF -DROCM_SYMLINK_LIBS=OFF -DCPACK_PACKAGING_INSTALL_PREFIX=/opt/rocm-6.2.0-488 -DROCM_DISABLE_LDCONFIG=ON -DROCM_PATH=/opt/rocm-6.2.0-488 -DCPACK_DEBIAN_DEBUGINFO_PACKAGE=FALSE -DCPACK_RPM_DEBUGINFO_PACKAGE=FALSE -DCPACK_RPM_INSTALL_WITH_EXEC=FALSE -DCMAKE_BUILD_TYPE=Release -DMIOPEN_BACKEND=HIP -DMIOPEN_OFFLINE_COMPILER_PATHS_V2=1 -DCMAKE_CXX_COMPILER=/opt/rocm-6.2.0-488/llvm/bin/clang++ -DCMAKE_C_COMPILER=/opt/rocm-6.2.0-488/llvm/bin/clang '-DCMAKE_PREFIX_PATH=/opt/rocm-6.2.0-488;/opt/rocm-6.2.0-488/hip;/long_pathname_so_that_rpms_can_package_the_debug_info/data/driver/MLOpen/deps' -DHIP_OC_COMPILER=/opt/rocm-6.2.0-488/bin/clang-ocl /long_pathname_so_that_rpms_can_package_the_debug_info/data/driver/MLOpen.
I have no idea where all these options come from and why. Among them is -DMIOPEN_OFFLINE_COMPILER_PATHS_V2=1, which is an indirect source of the error.
So maybe nogpu backend should be fixed somehow? :)
Let's enable MIOPEN_OFFLINE_COMPILER_PATHS_V2 by default and see ;)
-DMIOPEN_OFFLINE_COMPILER_PATHS_V2=1
@junliume In the attached logfile I see:
[2024-05-10T20:00:55.207Z] + cmake '-DCMAKE_PREFIX_PATH=/opt/rocm-6.2.0-488/llvm;/opt/rocm-6.2.0-488' '-DCMAKE_SHARED_LINKER_FLAGS_INIT=-Wl,--enable-new-dtags,--build-id=sha1,--rpath,$ORIGIN' '-DCMAKE_EXE_LINKER_FLAGS_INIT=-Wl,--enable-new-dtags,--build-id=sha1,--rpath,$ORIGIN/../lib' -DCMAKE_VERBOSE_MAKEFILE=1 -DCMAKE_BUILD_TYPE=RelWithDebInfo -DCMAKE_INSTALL_RPATH_USE_LINK_PATH=FALSE -DCMAKE_INSTALL_PREFIX=/opt/rocm-6.2.0-488 -DCMAKE_PACKAGING_INSTALL_PREFIX=/opt/rocm-6.2.0-488 -DBUILD_FILE_REORG_BACKWARD_COMPATIBILITY=OFF -DROCM_SYMLINK_LIBS=OFF -DCPACK_PACKAGING_INSTALL_PREFIX=/opt/rocm-6.2.0-488 -DROCM_DISABLE_LDCONFIG=ON -DROCM_PATH=/opt/rocm-6.2.0-488 -DCPACK_DEBIAN_DEBUGINFO_PACKAGE=FALSE -DCPACK_RPM_DEBUGINFO_PACKAGE=FALSE -DCPACK_RPM_INSTALL_WITH_EXEC=FALSE -DCMAKE_BUILD_TYPE=Release -DMIOPEN_BACKEND=HIP -DMIOPEN_OFFLINE_COMPILER_PATHS_V2=1 -DCMAKE_CXX_COMPILER=/opt/rocm-6.2.0-488/llvm/bin/clang++ -DCMAKE_C_COMPILER=/opt/rocm-6.2.0-488/llvm/bin/clang '-DCMAKE_PREFIX_PATH=/opt/rocm-6.2.0-488;/opt/rocm-6.2.0-488/hip;/long_pathname_so_that_rpms_can_package_the_debug_info/data/driver/MLOpen/deps' -DHIP_OC_COMPILER=/opt/rocm-6.2.0-488/bin/clang-ocl /long_pathname_so_that_rpms_can_package_the_debug_info/data/driver/MLOpen.I have no idea where all these options come from and why. Among them is
-DMIOPEN_OFFLINE_COMPILER_PATHS_V2=1, which is an indirect source of the error.So maybe nogpu backend should be fixed somehow? :)
Let's enable MIOPEN_OFFLINE_COMPILER_PATHS_V2 by default and see ;)
@atamazov Yes! Using -DMIOPEN_OFFLINE_COMPILER_PATHS_V2=1 finally I can reproduce this issue (I should have checked the cmake options more carefully).
The option was added in https://github.com/ROCm/MIOpen/pull/2694 however it was not considered as default till we discovered it now.
BTW~ with #2959 it seems that we can build successfully even with this option enabled.