dbcsr icon indicating copy to clipboard operation
dbcsr copied to clipboard

Consider to drop -Werror for tests/configs pulling external APIs/frameworks

Open hfp opened this issue 10 months ago • 12 comments

As we pull latest bits/dependencies into tests, our CI is prone to break automatically. For example, some recent ROCm deprecation seems to cause warnings (turned into errors). This may be wanted but if not, -Werror may be disabled for certain configs.

In file included from /__w/dbcsr/dbcsr/src/acc/cuda_hip/acc_init.cpp:13: /__w/dbcsr/dbcsr/src/acc/cuda_hip/acc_init.cpp: In function ‘int c_dbcsr_acc_init()’: /__w/dbcsr/dbcsr/src/acc/cuda_hip/acc_init.cpp:30:40: error: ‘hipError_t hipDevicePrimaryCtxRetain(ihipCtx_t**, hipDevice_t)’ is deprecated: This API is marked as deprecated and may not be supported in future releases. For more details please refer https://github.com/ROCm/HIP/blob/develop/docs/reference/deprecated_api_list.md [-Werror=deprecated-declarations] 30 | ACC_DRV_CALL(DevicePrimaryCtxRetain, (&ctx, acc_device)); /__w/dbcsr/dbcsr/src/acc/cuda_hip/../hip/acc_hip.h:33:35: note: in definition of macro ‘HIP_API_CALL’ 33 | hipError_t result = ACC(func) args;
| ^~~~ /__w/dbcsr/dbcsr/src/acc/cuda_hip/acc_init.cpp:30:3: note: in expansion of macro ‘ACC_DRV_CALL’ 30 | ACC_DRV_CALL(DevicePrimaryCtxRetain, (&ctx, acc_device)); | ^~~~~~~~~~~~ In file included from /opt/rocm-6.1.0/include/hip/hip_runtime.h:70, from /__w/dbcsr/dbcsr/src/acc/cuda_hip/../hip/acc_hip.h:13, from /__w/dbcsr/dbcsr/src/acc/cuda_hip/acc_init.cpp:13: /opt/rocm-6.1.0/include/hip/hip_runtime_api.h:5128:12: note: declared here 5128 | hipError_t hipDevicePrimaryCtxRetain(hipCtx_t* pctx, hipDevice_t dev); | ^~~~~~~~~~~~~~~~~~~~~~~~~ In file included from /__w/dbcsr/dbcsr/src/acc/cuda_hip/acc_init.cpp:13: /__w/dbcsr/dbcsr/src/acc/cuda_hip/acc_init.cpp: In function ‘int c_dbcsr_acc_finalize()’: /__w/dbcsr/dbcsr/src/acc/cuda_hip/acc_init.cpp:44:41: error: ‘hipError_t hipDevicePrimaryCtxRelease(hipDevice_t)’ is deprecated: This API is marked as deprecated and may not be supported in future releases. For more details please refer https://github.com/ROCm/HIP/blob/develop/docs/reference/deprecated_api_list.md [-Werror=deprecated-declarations] 44 | ACC_DRV_CALL(DevicePrimaryCtxRelease, (acc_device)); /__w/dbcsr/dbcsr/src/acc/cuda_hip/../hip/acc_hip.h:33:35: note: in definition of macro ‘HIP_API_CALL’ 33 | hipError_t result = ACC(func) args;
| ^~~~ /__w/dbcsr/dbcsr/src/acc/cuda_hip/acc_init.cpp:44:3: note: in expansion of macro ‘ACC_DRV_CALL’ 44 | ACC_DRV_CALL(DevicePrimaryCtxRelease, (acc_device)); | ^~~~~~~~~~~~ In file included from /opt/rocm-6.1.0/include/hip/hip_runtime.h:70, from /__w/dbcsr/dbcsr/src/acc/cuda_hip/../hip/acc_hip.h:13, from /__w/dbcsr/dbcsr/src/acc/cuda_hip/acc_init.cpp:13: /opt/rocm-6.1.0/include/hip/hip_runtime_api.h:5112:12: note: declared here 5112 | hipError_t hipDevicePrimaryCtxRelease(hipDevice_t dev); | ^~~~~~~~~~~~~~~~~~~~~~~~~~ cc1plus: all warnings being treated as errors

hfp avatar Apr 22 '24 14:04 hfp

So, the problem here is that we use FROM rocm/dev-ubuntu-22.04:latest which moved from ROCM 6.0.1 to 6.1.0 and now some HIP functions are deprecated... I wonder if we should set a tag for ROCM (up to 6.0.2 for example). I don't think we should remove -Werror... ROCM 6.1 introduced a long list of deprecated functions, however I'm not sure how we should update them... Maybe @gsitaram can help?

@hfp I would say, for the moment, please ignore this problem.

alazzaro avatar Apr 22 '24 19:04 alazzaro

@hfp , you can also add -Wno-error=deprecated-declarations for the moment...

alazzaro avatar Apr 22 '24 19:04 alazzaro

I'll take a look and get back.

gsitaram avatar Jun 24 '24 15:06 gsitaram

Hi @alazzaro, @hfp, I see the following message in the ROCm docs:

On the AMD platform, context management APIs are deprecated as there are better alternate interfaces, such as using hipSetDevice and stream APIs to achieve the required functionality.

On the NVIDIA platform, CUDA supports the driver API that defines "Context" and "Devices" as separate entities. Each context contains a single device, which can theoretically have multiple contexts. HIP initially added limited support for these APIs to facilitate easy porting from existing driver codes.

These APIs are only for equivalent driver APIs on the NVIDIA platform.

Are these functions necessary in DBCSR? Or could we replace them with equivalent hipDevice* functionality?

gsitaram avatar Jun 24 '24 15:06 gsitaram

Thanks @gsitaram I have no idea how to answer to your question... I spent some time to understand how to replace them, but I cannot get it from the AMD documentation...

alazzaro avatar Jun 24 '24 15:06 alazzaro

@alazzaro , could you point me to all the functions that DBCSR is using but is deprecated now? Is there an error report from CI that I could refer to?

gsitaram avatar Jun 24 '24 16:06 gsitaram

@alazzaro , could you point me to all the functions that DBCSR is using but is deprecated now? Is there an error report from CI that I could refer to?

For that, please see @hfp issue of today (https://github.com/cp2k/dbcsr/issues/810)

alazzaro avatar Jun 24 '24 16:06 alazzaro

Oh, oh, I didn't pay much attention... https://github.com/cp2k/dbcsr/issues/810 is related to CUDA... Then, for rocm, the deprecated functions are:

https://github.com/cp2k/dbcsr/actions/runs/9636368793/job/26574254231

Search for "deprecated and may"...

alazzaro avatar Jun 24 '24 16:06 alazzaro

in principle CMake should be adding header files from external packages with -isystem, on the other hand it will then just start failing hard when things are finally being removed

dev-zero avatar Jun 24 '24 19:06 dev-zero

moving our docker image to use a tagged upstream and then use something like the pre-commit bot to scan the images periodically might be a better method?

dev-zero avatar Jun 24 '24 19:06 dev-zero

in principle CMake should be adding header files from external packages with -isystem, on the other hand it will then just start failing hard when things are finally being removed

moving our docker image to use a tagged upstream and then use something like the pre-commit bot to scan the images periodically might be a better method?

Are your comments meant for this issue?

hfp avatar Jun 25 '24 06:06 hfp

Are your comments meant for this issue?

yes, but it seems I omitted some part: by passing the include dirs for external API/frameworks with -isystem, at least gcc should not be emitting warnings coming from those headers.

dev-zero avatar Jun 25 '24 06:06 dev-zero