OpenBLAS icon indicating copy to clipboard operation
OpenBLAS copied to clipboard

Linking to the library statically on Windows breaks multi-arch support

Open 3e33 opened this issue 2 years ago • 20 comments

Now I'm not entirely sure what's going on, but MSVC on Windows requires /OPT:NOREF to be added when static linking to OpenBLAS, otherwise it doesn't work on architectures the library wasn't being built on. The problem is this option doesn't work for clang-cl, even with /opt:noref, something required is still removed during linking. This issue doesn't happen at all on Linux. But happens with two different linkers on Windows (don't know about GCC), which makes me think it's a bug in OpenBLAS.

3e33 avatar Mar 31 '23 22:03 3e33

Can you please be more specific about what "it does not work" means, and what you are trying to achieve ? By "architectures it was not built on, do you mean a dual x64/arm64 build, or just different models of x64 cpus (what DYNAMIC_ARCH is for) ?

martin-frbg avatar Apr 01 '23 07:04 martin-frbg

Other models of CPUs don't work. If I compile as a dynamic library, and link my application to it, everything is fine. If I compile a static library, and link my app to it statically on Linux with Clang++, everything is fine and the final app (also a dynamic library) can be loaded. If I link on Windows with Clang++ or MSVC's cl, the final .dll loads on my machine, but not on other CPUs (so AMD for me, but Intel doesn't work). If I link with MSVC and add /OPT:NOREF, then once again everything works as expected. But the same doesn't work for clang with /opt:noref (which might be a problem with lld / clang)

From what I can understand, while linking there's a large chunk of OpenBLAS that gets optimised out, which is what /OPT:NOREF fixes.

/OPT:REF eliminates functions and data that are never referenced; /OPT:NOREF keeps functions and data that are never referenced.

When /OPT:REF is enabled, LINK removes unreferenced packaged functions and data, known as COMDATs. This optimization is known as transitive COMDAT elimination. The /OPT:REF option also disables incremental linking. https://learn.microsoft.com/en-us/cpp/build/reference/opt-optimizations?view=msvc-170

For some reason, on Windows, compilers / linkers think that needed parts of OpenBLAS aren't referenced and remove them during a static build.

3e33 avatar Apr 01 '23 07:04 3e33

Maybe it is related to lld's link-time optimizations, can you try building with -fno-lto added to the CFLAGS ? But searching finds many similar questions (even in the llvm issue tracker) with no clear answer, so it could well be a design limitation in llvm on Windows.

martin-frbg avatar Apr 01 '23 09:04 martin-frbg

-fno-lto hasn't made any difference, it's also on by default according to clang++ -help

3e33 avatar Apr 01 '23 09:04 3e33

That's unfortunate - and as I'm not a Windows guy I can only hope someone else has a better suggestion. Maybe https://developercommunity.visualstudio.com/t/optnoref-still-eliminates-unused-functiondata/1178871 and what it says about clang is related ?

martin-frbg avatar Apr 01 '23 10:04 martin-frbg

Sure sounds related, but that means that OpenBLAS can only be linked to dynamically, if you're using clang on Windows. Seems like a bug.

3e33 avatar Apr 01 '23 11:04 3e33

In file driver/others/dynamic.c there is dependency that MSVC linker does not see. It goes like

gotoblas=&gotoblas_TARGET
detect_cpu
gotoblas=&gotoblas_detected // MSVC does not see this dozen-function list as dependencies

You can work around limitation by force-including all gotoblas_CORETYPE functions, which will more or less include all functions in library. By large no gain over dynamic library.

brada4 avatar Apr 02 '23 07:04 brada4

You can work around limitation by force-including all gotoblas_CORETYPE functions, which will more or less include all functions in library. By large no gain over dynamic library.

What does this entail exactly? I'm unfamiliar with the "force include" terminology.

borrrden avatar Jan 31 '24 00:01 borrrden

Unfortunately that suggestion may have been pure handwaving... sadly the VS community thread linked above did not offer any workarounds beyond "don't do that then" either

martin-frbg avatar Jan 31 '24 11:01 martin-frbg

could you try if using /OPT:NOICF in addition to the /OPT:NOREF has any effect ? (I guess the chance is slim, but I have no better suggestion at the moment)

martin-frbg avatar Jan 31 '24 19:01 martin-frbg

Also if in project you need to change 2 project options: https://learn.microsoft.com/en-us/cpp/build/reference/opt-optimizations?view=msvc-170

brada4 avatar Jan 31 '24 19:01 brada4

Do these options apply to clang-cl as well? I haven't actually confirmed any issues yet I'm just trying to get ahead of this in case I immediately run into it when we are ready to use this on Windows. If all else fails I'll switch to building shared, I just thought it would be a shame to alter things just for Windows' sake. In theory once we are ready to use Windows I should run into this pretty quickly given that our build agents are Intel based and my dev machine is AMD based so I'll try any suggestions that I see here.

My build process is essentially the same as in the wiki

borrrden avatar Jan 31 '24 20:01 borrrden

I ended up using Clang on Linux and MSVC on Windows. The reason I wanted to avoid building the shared library is because of its size.

3e33 avatar Jan 31 '24 21:01 3e33

I assume so, and it could be that it is specifically a problem with clang-cl (via the LLVM linker). (On the other hand, one wouldn't want to use plain old MSVC as that cannot handle all the optimized assembly kernels)

martin-frbg avatar Jan 31 '24 21:01 martin-frbg

The size is extra code paths for more architectures. You cannot have both eliminate them and expect them to be there.

brada4 avatar Jan 31 '24 22:01 brada4

Thanks for the insight. Unlike the entire world, it seems, I do the majority of my dev work using Windows so when I get to this I will surely report back what I find, and any potential fixes if I'm clever enough to find them 😋 .

borrrden avatar Jan 31 '24 22:01 borrrden

@brada4 I could be wrong but I haven't noticed any issues and the size is reduced by more than half compiling statically just for the archs I need.

3e33 avatar Jan 31 '24 22:01 3e33

Yeah, then you complain its not linking together.... You can build TARGET=GENERIC for small 1-arch library (better clang than cl.exe)

brada4 avatar Jan 31 '24 23:01 brada4

It works with MSVC on Windows and with Clang on Linux. It's only a clang-cl issue.

3e33 avatar Jan 31 '24 23:01 3e33

Yes, and Linux makes 50MB library that you cannot trim down, but thats not a problem with windows linker, it is certainly OpenBLAS.

brada4 avatar Jan 31 '24 23:01 brada4