boost-multi alignof(std::max_align_t)
Describe the bug From Jakub Kurzak, AMD
There was a bug in boost-multi, which has been fixed: https://gitlab.com/correaa/boost-multi/-/commit/30d75aad4700552ea6dca50c278e73dd73f173f2 The issue actually has been reported on QMCPACK's GitHub page: https://github.com/QMCPACK/qmcpack/issues/3696#issuecomment-1010573751
alignof(std::max_align_t) is calculated as the largest scalar type, which is usually long double. For amdgpu and nvptx, long double is actually just double, which is 8 bytes. nvcc calculates alignof(std::max_align_t) as 16, which is incorrect. cuda-clang correctly calcuates alignof(std::max_align_t) as 8. There was a recent fix in our compiler, which changed alignof(std::max_align_t) from 16 to 8.
QMCPACK needs to be updated to the version of boost-multi with this fix
@correaa please can you make a new release of boost-multi and PR this in?
Yes, I can make a release with this fix. And this explains a back and forth I had before, with hardcoding it or use max_align_t.
What I don't know is how to permanently fix this. Should I take this as a workaround of a NVCC bug?
Any opinion?
Depends on whether you can detect nvcc appropriately. If so, then workarounds are possible. We have them for other compilers.
@prckent , I guess I can detect nvcc with __NVCC__, however I am now afraid to create an ABI incompatibility when linking gcc compiled code and nvcc compiled code. So I am going to leave it as it is for now until I get better advise.
I made a tagged version that includes this workaround: https://gitlab.com/correaa/boost-multi/-/tags/v0.79
In any case, glad to know that Multi is being tried in AMD environments.
This is very similar to OpenMP offload experience, nvptx pass and host past doesn't agree. My solution was that the alignment should be configured by user not defined by the compiler. @correaa Could you allow configuring the alignment by a macro. #if !defined(MULTI_ALIGNMENT) #define MULTI_ALIGNMENT alignof(std::max_align_t) #endif So I can pass -DMULTI_ALIGNMENT=32. QMCPACK align memory at 32 bytes or 64bytes.
And then update CMake to detect and set this by default when we know nvcc is being used somewhere in the compilation chain?
QMCPACK doesn't invoke the CMake of multi but it does have QMC_SIMD_ALIGNMENT. So just need a way to propagate it to multi.
That is what I was thinking. Another option would be to require the setting of this option if we detect NVCC, plus give some guidance.
@prckent , I guess I can detect nvcc with
__NVCC__, however I am now afraid to create an ABI incompatibility when linking gcc compiled code and nvcc compiled code. So I am going to leave it as it is for now until I get better advise.I made a tagged version that includes this workaround: https://gitlab.com/correaa/boost-multi/-/tags/v0.79
In any case, glad to know that Multi is being tried in AMD environments.
@correaa could you make PR with multi updated to 0.79?
@correaa could you make PR with multi updated to 0.79?
yes, I can. I am uncomfortable that I cannot pass all the test in my system. See here: https://github.com/QMCPACK/qmcpack/issues/4038. Can somebody confirm this?
@correaa your latest PR https://github.com/QMCPACK/qmcpack/pull/4210 has been merged.