qmcpack boost-multi alignof(std::max_align

Describe the bug From Jakub Kurzak, AMD

There was a bug in boost-multi, which has been fixed: https://gitlab.com/correaa/boost-multi/-/commit/30d75aad4700552ea6dca50c278e73dd73f173f2 The issue actually has been reported on QMCPACK's GitHub page: https://github.com/QMCPACK/qmcpack/issues/3696#issuecomment-1010573751

alignof(std::max_align_t) is calculated as the largest scalar type, which is usually long double. For amdgpu and nvptx, long double is actually just double, which is 8 bytes. nvcc calculates alignof(std::max_align_t) as 16, which is incorrect. cuda-clang correctly calcuates alignof(std::max_align_t) as 8. There was a recent fix in our compiler, which changed alignof(std::max_align_t) from 16 to 8.

QMCPACK needs to be updated to the version of boost-multi with this fix

Jul 18 '22 21:07 ye-luo

@correaa please can you make a new release of boost-multi and PR this in?

Jul 20 '22 15:07 prckent

Yes, I can make a release with this fix. And this explains a back and forth I had before, with hardcoding it or use max_align_t.

What I don't know is how to permanently fix this. Should I take this as a workaround of a NVCC bug?

Any opinion?

Jul 20 '22 16:07 correaa

Depends on whether you can detect nvcc appropriately. If so, then workarounds are possible. We have them for other compilers.

Jul 20 '22 18:07 prckent

@prckent , I guess I can detect nvcc with __NVCC__, however I am now afraid to create an ABI incompatibility when linking gcc compiled code and nvcc compiled code. So I am going to leave it as it is for now until I get better advise.

I made a tagged version that includes this workaround: https://gitlab.com/correaa/boost-multi/-/tags/v0.79

In any case, glad to know that Multi is being tried in AMD environments.

Jul 24 '22 18:07 correaa

This is very similar to OpenMP offload experience, nvptx pass and host past doesn't agree. My solution was that the alignment should be configured by user not defined by the compiler. @correaa Could you allow configuring the alignment by a macro. #if !defined(MULTI_ALIGNMENT) #define MULTI_ALIGNMENT alignof(std::max_align_t) #endif So I can pass -DMULTI_ALIGNMENT=32. QMCPACK align memory at 32 bytes or 64bytes.

Jul 25 '22 18:07 ye-luo

And then update CMake to detect and set this by default when we know nvcc is being used somewhere in the compilation chain?

Jul 25 '22 18:07 prckent

QMCPACK doesn't invoke the CMake of multi but it does have QMC_SIMD_ALIGNMENT. So just need a way to propagate it to multi.

Jul 25 '22 18:07 ye-luo

That is what I was thinking. Another option would be to require the setting of this option if we detect NVCC, plus give some guidance.

Jul 25 '22 18:07 prckent

@prckent , I guess I can detect nvcc with __NVCC__, however I am now afraid to create an ABI incompatibility when linking gcc compiled code and nvcc compiled code. So I am going to leave it as it is for now until I get better advise.

I made a tagged version that includes this workaround: https://gitlab.com/correaa/boost-multi/-/tags/v0.79

In any case, glad to know that Multi is being tried in AMD environments.

@correaa could you make PR with multi updated to 0.79?

Aug 17 '22 18:08 ye-luo

@correaa could you make PR with multi updated to 0.79?

yes, I can. I am uncomfortable that I cannot pass all the test in my system. See here: https://github.com/QMCPACK/qmcpack/issues/4038. Can somebody confirm this?

Aug 25 '22 07:08 correaa

@correaa your latest PR https://github.com/QMCPACK/qmcpack/pull/4210 has been merged.

Sep 06 '22 15:09 ye-luo

boost-multi alignof(std::max_align_t)