dbcsr
dbcsr copied to clipboard
Consider to drop -Werror for tests/configs pulling external APIs/frameworks
As we pull latest bits/dependencies into tests, our CI is prone to break automatically. For example, some recent ROCm deprecation seems to cause warnings (turned into errors). This may be wanted but if not, -Werror
may be disabled for certain configs.
In file included from /__w/dbcsr/dbcsr/src/acc/cuda_hip/acc_init.cpp:13: /__w/dbcsr/dbcsr/src/acc/cuda_hip/acc_init.cpp: In function ‘int c_dbcsr_acc_init()’: /__w/dbcsr/dbcsr/src/acc/cuda_hip/acc_init.cpp:30:40: error: ‘hipError_t hipDevicePrimaryCtxRetain(ihipCtx_t**, hipDevice_t)’ is deprecated: This API is marked as deprecated and may not be supported in future releases. For more details please refer https://github.com/ROCm/HIP/blob/develop/docs/reference/deprecated_api_list.md [-Werror=deprecated-declarations] 30 | ACC_DRV_CALL(DevicePrimaryCtxRetain, (&ctx, acc_device)); /__w/dbcsr/dbcsr/src/acc/cuda_hip/../hip/acc_hip.h:33:35: note: in definition of macro ‘HIP_API_CALL’ 33 | hipError_t result = ACC(func) args;
| ^~~~ /__w/dbcsr/dbcsr/src/acc/cuda_hip/acc_init.cpp:30:3: note: in expansion of macro ‘ACC_DRV_CALL’ 30 | ACC_DRV_CALL(DevicePrimaryCtxRetain, (&ctx, acc_device)); | ^~~~~~~~~~~~ In file included from /opt/rocm-6.1.0/include/hip/hip_runtime.h:70, from /__w/dbcsr/dbcsr/src/acc/cuda_hip/../hip/acc_hip.h:13, from /__w/dbcsr/dbcsr/src/acc/cuda_hip/acc_init.cpp:13: /opt/rocm-6.1.0/include/hip/hip_runtime_api.h:5128:12: note: declared here 5128 | hipError_t hipDevicePrimaryCtxRetain(hipCtx_t* pctx, hipDevice_t dev); | ^~~~~~~~~~~~~~~~~~~~~~~~~ In file included from /__w/dbcsr/dbcsr/src/acc/cuda_hip/acc_init.cpp:13: /__w/dbcsr/dbcsr/src/acc/cuda_hip/acc_init.cpp: In function ‘int c_dbcsr_acc_finalize()’: /__w/dbcsr/dbcsr/src/acc/cuda_hip/acc_init.cpp:44:41: error: ‘hipError_t hipDevicePrimaryCtxRelease(hipDevice_t)’ is deprecated: This API is marked as deprecated and may not be supported in future releases. For more details please refer https://github.com/ROCm/HIP/blob/develop/docs/reference/deprecated_api_list.md [-Werror=deprecated-declarations] 44 | ACC_DRV_CALL(DevicePrimaryCtxRelease, (acc_device)); /__w/dbcsr/dbcsr/src/acc/cuda_hip/../hip/acc_hip.h:33:35: note: in definition of macro ‘HIP_API_CALL’ 33 | hipError_t result = ACC(func) args;
| ^~~~ /__w/dbcsr/dbcsr/src/acc/cuda_hip/acc_init.cpp:44:3: note: in expansion of macro ‘ACC_DRV_CALL’ 44 | ACC_DRV_CALL(DevicePrimaryCtxRelease, (acc_device)); | ^~~~~~~~~~~~ In file included from /opt/rocm-6.1.0/include/hip/hip_runtime.h:70, from /__w/dbcsr/dbcsr/src/acc/cuda_hip/../hip/acc_hip.h:13, from /__w/dbcsr/dbcsr/src/acc/cuda_hip/acc_init.cpp:13: /opt/rocm-6.1.0/include/hip/hip_runtime_api.h:5112:12: note: declared here 5112 | hipError_t hipDevicePrimaryCtxRelease(hipDevice_t dev); | ^~~~~~~~~~~~~~~~~~~~~~~~~~ cc1plus: all warnings being treated as errors
So, the problem here is that we use FROM rocm/dev-ubuntu-22.04:latest
which moved from ROCM 6.0.1 to 6.1.0 and now some HIP functions are deprecated...
I wonder if we should set a tag for ROCM (up to 6.0.2 for example). I don't think we should remove -Werror...
ROCM 6.1 introduced a long list of deprecated functions, however I'm not sure how we should update them... Maybe @gsitaram can help?
@hfp I would say, for the moment, please ignore this problem.
@hfp , you can also add -Wno-error=deprecated-declarations
for the moment...
I'll take a look and get back.
Hi @alazzaro, @hfp, I see the following message in the ROCm docs:
On the AMD platform, context management APIs are deprecated as there are better alternate interfaces, such as using hipSetDevice and stream APIs to achieve the required functionality.
On the NVIDIA platform, CUDA supports the driver API that defines "Context" and "Devices" as separate entities. Each context contains a single device, which can theoretically have multiple contexts. HIP initially added limited support for these APIs to facilitate easy porting from existing driver codes.
These APIs are only for equivalent driver APIs on the NVIDIA platform.
Are these functions necessary in DBCSR? Or could we replace them with equivalent hipDevice* functionality?
Thanks @gsitaram I have no idea how to answer to your question... I spent some time to understand how to replace them, but I cannot get it from the AMD documentation...
@alazzaro , could you point me to all the functions that DBCSR is using but is deprecated now? Is there an error report from CI that I could refer to?
@alazzaro , could you point me to all the functions that DBCSR is using but is deprecated now? Is there an error report from CI that I could refer to?
For that, please see @hfp issue of today (https://github.com/cp2k/dbcsr/issues/810)
Oh, oh, I didn't pay much attention... https://github.com/cp2k/dbcsr/issues/810 is related to CUDA... Then, for rocm, the deprecated functions are:
https://github.com/cp2k/dbcsr/actions/runs/9636368793/job/26574254231
Search for "deprecated and may"...
in principle CMake should be adding header files from external packages with -isystem
, on the other hand it will then just start failing hard when things are finally being removed
moving our docker image to use a tagged upstream and then use something like the pre-commit bot to scan the images periodically might be a better method?
in principle CMake should be adding header files from external packages with
-isystem
, on the other hand it will then just start failing hard when things are finally being removed
moving our docker image to use a tagged upstream and then use something like the pre-commit bot to scan the images periodically might be a better method?
Are your comments meant for this issue?
Are your comments meant for this issue?
yes, but it seems I omitted some part: by passing the include dirs for external API/frameworks with -isystem
, at least gcc should not be emitting warnings coming from those headers.