OpenBLAS icon indicating copy to clipboard operation
OpenBLAS copied to clipboard

OpenBLAS link error to openmp functions

Open Billzhong2022 opened this issue 1 year ago • 24 comments

Hi Team,

I try to build OpenBLAS by enabling openmp on Windows on ARM device. I've reproduced below link errors. Do you know the issue? How to solve it?

Thanks!

[Build commands] cmake .. -G Ninja -DCMAKE_C_COMPILER=clang-cl -DCMAKE_Fortran_COMPILER=flang-new -DBUILD_SHARED_LIBS=TRUE -DUSE_THREAD=1 -DUSE_OPENMP=1 -DOpenMP_Fortran_FLAGS=-fopenmp -DCMAKE_BUILD_TYPE=Release cmake --build . --config Release

[Error logs] lld-link: error: undefined symbol: __kmpc_for_static_init_8

referenced by driver/others/CMakeFiles/driver_others.dir/blas_server_omp.c.obj:(exec_blas.omp_outlined) referenced by driver/others/CMakeFiles/driver_others.dir/blas_server_omp.c.obj:(exec_blas.omp_outlined.1)

lld-link: error: undefined symbol: __kmpc_for_static_fini

referenced by driver/others/CMakeFiles/driver_others.dir/blas_server_omp.c.obj:(exec_blas.omp_outlined) referenced by driver/others/CMakeFiles/driver_others.dir/blas_server_omp.c.obj:(exec_blas.omp_outlined.1)

lld-link: error: undefined symbol: __declspec(dllimport) omp_get_thread_num

referenced by driver/others/CMakeFiles/driver_others.dir/blas_server_omp.c.obj:(exec_threads) referenced by driver/others/CMakeFiles/driver_others.dir/blas_server_omp.c.obj:(exec_threads)

lld-link: error: undefined symbol: __kmpc_master

referenced by CMakeFiles/LAPACK_OVERRIDES.dir/lapack-netlib/SRC/ssytrd_sb2st.F.obj:(ssytrd_sb2st_..omp_par) referenced by CMakeFiles/LAPACK_OVERRIDES.dir/lapack-netlib/SRC/dsytrd_sb2st.F.obj:(dsytrd_sb2st_..omp_par) referenced by CMakeFiles/LAPACK_OVERRIDES.dir/lapack-netlib/SRC/chetrd_hb2st.F.obj:(chetrd_hb2st_..omp_par) referenced 1 more times

lld-link: error: undefined symbol: __kmpc_end_master

referenced by CMakeFiles/LAPACK_OVERRIDES.dir/lapack-netlib/SRC/ssytrd_sb2st.F.obj:(ssytrd_sb2st_..omp_par) referenced by CMakeFiles/LAPACK_OVERRIDES.dir/lapack-netlib/SRC/dsytrd_sb2st.F.obj:(dsytrd_sb2st_..omp_par) referenced by CMakeFiles/LAPACK_OVERRIDES.dir/lapack-netlib/SRC/chetrd_hb2st.F.obj:(chetrd_hb2st_..omp_par) referenced 1 more times

lld-link: error: undefined symbol: omp_get_num_threads

referenced by CMakeFiles/LAPACK_OVERRIDES.dir/lapack-netlib/SRC/iparam2stage.F.obj:(iparam2stage_..omp_par) ninja: build stopped: subcommand failed.

Billzhong2022 avatar Dec 14 '23 06:12 Billzhong2022

Unfortunately this looks like a problem in the flang-new for Windows/Arm64 (maybe you can try with a later patch release of LLVM17, depending on which one you are using now).

martin-frbg avatar Dec 14 '23 09:12 martin-frbg

Thanks for your reply!

Can I config libomp.lib to fix this issue? How to config it? BTW, I use LLVM WoA version 17.0.6 and CMake 3.28.

Or How to build openblas to let it execute in high performance?

Billzhong2022 avatar Dec 14 '23 10:12 Billzhong2022

Normally I would expect it to link libomp without any special configuration, just from having -fopenmp on the command line (which should be added automatically if you specified -DUSE_OPENMP=ON. I guess you could experiment with putting it on the target_link_libraries line (around line 308 of the toplevel CMakeLists.txt). Is it too slow for your needs when you don't use OpenMP ? (There are some fixes to speed up Windows thread management in the current develop branch, also it will depend on your hardware if relevant BLAS functions have optimized implementations in OpenBLAS)

martin-frbg avatar Dec 14 '23 12:12 martin-frbg

BTW - unfortunately I do not have any Windows on Arm setup available to test, maybe @everton1984 can comment on the current status (I notice #3973 is still open) or @mmuetzel ?

martin-frbg avatar Dec 14 '23 12:12 martin-frbg

I don't have access to Windows on ARM hardware either. Nor do I have any experience with clang-cl. What I can tell is that OpenBLAS is built and distributed for Windows on ARM with OpenMP using clang (the MinGW version of it) by MSYS2: https://github.com/msys2/MINGW-packages/blob/4c0259a04a205ae8175ece19fe7260be958cdf8c/mingw-w64-openblas/PKGBUILD#L98

I'm not aware of reports about issues with that version of OpenBLAS for Windows on ARM.

mmuetzel avatar Dec 14 '23 15:12 mmuetzel

you have iomp5 symbols in your build output. You need to start with a clean source tree and make different builds in different (sub-)directories.

brada4 avatar Dec 14 '23 20:12 brada4

@brada4 what do you mean ? the kmpc ones are definitely in llvm omp

martin-frbg avatar Dec 14 '23 20:12 martin-frbg

Heh, same comes out if you mix up mkl linker commands..... Clean+rebuild is one of possibilities.

brada4 avatar Dec 15 '23 04:12 brada4

I've downloaded openmp libomp.dll and libomp.a from https://packages.msys2.org/package/mingw-w64-clang-aarch64-openmp?repo=clangarm64. Where should I put them? So that openblas can link to openmp.

Here are more information.

OS: Windows 11 ARM64 OpenBLAS version: v0.3.24 Build tools: Visual Studio 2022 + CMake3.28 + LLVM 17.0.6

Billzhong2022 avatar Dec 15 '23 06:12 Billzhong2022

Hi team,

What does below generation logs mean?

Key logs: Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed Looking for pthread_create in pthreads - not found

Detailed Logs: -- fortran lapack -- Building deprecated routines -- Building Single Precision -- Building Double Precision -- Building Single Precision Complex -- Building Double Precision Complex -- Performing Test CMAKE_HAVE_LIBC_PTHREAD -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed -- Looking for pthread_create in pthreads -- Looking for pthread_create in pthreads - not found -- Looking for pthread_create in pthread -- Looking for pthread_create in pthread - not found -- Found Threads: TRUE -- Generating openblas_config.h in include/openblas -- Generating f77blas.h in include/openblas -- Generating cblas.h in include/openblas -- Copying LAPACKE header files to include/openblas -- Configuring done (14.4s) -- Generating done (0.6s)

Billzhong2022 avatar Dec 15 '23 06:12 Billzhong2022

no idea, if you mix mingw and llvm files such link errors are expected. Just make as clean development tools install as possible.

brada4 avatar Dec 15 '23 08:12 brada4

Hi team,

I've checked that LLVM 17.0.6 doesn't have libomp.lib and libomp.dll. Can you enable openmp for openblas on Windows X64 device?

Billzhong2022 avatar Dec 15 '23 09:12 Billzhong2022

visual studio includes old msomp and llvm openmp if using clang-CL.EXE you need to select -openmp=llvm or ms

brada4 avatar Dec 15 '23 10:12 brada4

What is detail command of openmp flag when use clang-cl.exe?

Thanks!

Billzhong2022 avatar Dec 18 '23 07:12 Billzhong2022

Use plain clang.exe clang-cl is just partial cl.exe replica Or: https://devblogs.microsoft.com/cppblog/improved-openmp-support-for-cpp-in-visual-studio/

brada4 avatar Dec 18 '23 08:12 brada4

Hi Team,

On Windows on ARM device, I use below commands to generate and build OpenBLAS, I can find "Found OpenMP_C: -Xclang -fopenmp (found version "5.1")" in generation logs. But when I build OpenBLAS, still has openmp link erros.

I use LLVM17.0.6 & CMAKE 3.28.

Do you know the reason? How to link to correct omp functions?

Thanks a lot!

[Build commands] cmake .. -G Ninja -DCMAKE_C_COMPILER=clang-cl -DNOFORTRAN=1 -DBUILD_SHARED_LIBS=TRUE -DUSE_OPENMP=1 -DCMAKE_BUILD_TYPE=Release -DBUILD_WITHOUT_LAPACK=0 cmake --build . --config Release

[Generation logs] -- Running getarch -- GETARCH results: CORE=ARMV8 LIBCORE=armv8 NUM_CORES=12 MAKEFLAGS += -j 12

-- Compiling a 64-bit binary. -- Found OpenMP_C: -Xclang -fopenmp (found version "5.1") -- Found OpenMP: TRUE (found version "5.1") -- Building Single Precision -- Building Double Precision -- Building Complex Precision -- Building Double Complex Precision

[Link error logs] [6754/6754] Linking C shared library lib\openblas.dll FAILED: lib/openblas.dll lib/Release/openblas.lib

lld-link: error: undefined symbol: __declspec(dllimport) omp_get_max_threads

referenced by interface/CMakeFiles/interface.dir/CMakeFiles/saxpy.c.obj:(saxpy_) referenced by interface/CMakeFiles/interface.dir/CMakeFiles/saxpy.c.obj:(saxpy_) referenced by interface/CMakeFiles/interface.dir/CMakeFiles/sswap.c.obj:(sswap_) referenced 503 more times

lld-link: error: undefined symbol: __declspec(dllimport) omp_in_parallel

referenced by interface/CMakeFiles/interface.dir/CMakeFiles/saxpy.c.obj:(saxpy_) referenced by interface/CMakeFiles/interface.dir/CMakeFiles/saxpy.c.obj:(saxpy_) referenced by interface/CMakeFiles/interface.dir/CMakeFiles/sswap.c.obj:(sswap_) referenced 485 more times

lld-link: error: undefined symbol: __kmpc_global_thread_num

referenced by driver/others/CMakeFiles/driver_others.dir/blas_server_omp.c.obj:(exec_blas)

lld-link: error: undefined symbol: __kmpc_push_num_threads

referenced by driver/others/CMakeFiles/driver_others.dir/blas_server_omp.c.obj:(exec_blas)

lld-link: error: undefined symbol: __kmpc_fork_call

referenced by driver/others/CMakeFiles/driver_others.dir/blas_server_omp.c.obj:(exec_blas)

lld-link: error: undefined symbol: __kmpc_for_static_init_8

referenced by driver/others/CMakeFiles/driver_others.dir/blas_server_omp.c.obj:(exec_blas.omp_outlined) referenced by driver/others/CMakeFiles/driver_others.dir/blas_server_omp.c.obj:(exec_blas.omp_outlined.1)

lld-link: error: undefined symbol: __kmpc_for_static_fini

referenced by driver/others/CMakeFiles/driver_others.dir/blas_server_omp.c.obj:(exec_blas.omp_outlined) referenced by driver/others/CMakeFiles/driver_others.dir/blas_server_omp.c.obj:(exec_blas.omp_outlined.1)

lld-link: error: undefined symbol: __declspec(dllimport) omp_get_thread_num

referenced by driver/others/CMakeFiles/driver_others.dir/blas_server_omp.c.obj:(exec_threads) referenced by driver/others/CMakeFiles/driver_others.dir/blas_server_omp.c.obj:(exec_threads) ninja: build stopped: subcommand failed.

Billzhong2022 avatar Dec 20 '23 03:12 Billzhong2022

it links to older openmp provided by microsoft despite one cmake detected. Use clang.exe for CC, kind of easy.

brada4 avatar Dec 20 '23 07:12 brada4

I use clang.exe, don't have openmp link errors. But built openblas.lib is about 1KB that is very small. And can't find any export function in built file openblas.dll.

Anything wrong? How to config openmp for openblas?

[Build commands] cmake .. -G Ninja -DCMAKE_C_COMPILER=clang -DNOFORTRAN=1 -DBUILD_SHARED_LIBS=TRUE -DUSE_OPENMP=1 -DCMAKE_BUILD_TYPE=Release -DBUILD_WITHOUT_LAPACK=0 cmake --build . --config Release

[File openblas.dll without any export symbol]

dumpbin /exports .\openblas.dll Microsoft (R) COFF/PE Dumper Version 14.38.33133.0 Copyright (C) Microsoft Corporation. All rights reserved.

Dump of file .\openblas.dll

File Type: DLL

Summary

    1000 .00cfg
    8000 .data
    E000 .pdata
   36000 .rdata
    2000 .reloc
    1000 .rsrc
  8DD000 .text
   10000 .tls

Billzhong2022 avatar Dec 20 '23 08:12 Billzhong2022

.text is 10MB which is reasonable for single cpu type. Frankly no idea how microsoft omp hangs in the way, it is not in visual studio by default.

brada4 avatar Dec 20 '23 11:12 brada4

On Windows on ARM device, I can build openblas with below commands. Seems openblas performance isn’t improved.

Do you know the reason?

[Build commands]

cmake .. -G Ninja -DCMAKE_C_COMPILER=clang-cl -DNOFORTRAN=1 -DBUILD_SHARED_LIBS=TRUE -DCMAKE_BUILD_TYPE=Release -DPARALLEL=1 -DBUILD_WITHOUT_LAPACK=0 -DUSE_OPENMP=1 -DOpenMP_C_FLAGS="-fopenmp=libomp" -DOpenMP_C_LIB_NAMES="libomp" -DOpenMP_libomp_LIBRARY="libomp.lib"

cmake --build . --config Release -j32 

Billzhong2022 avatar Dec 25 '23 06:12 Billzhong2022

Please present some measurements. like integrate various OpenBLAS builds in octave or R and run same benchmark scripts over and over. You needed OpenMP, which means you can call OpenBLAS from OpenMP parallel sections and manage yourself the parallelism of multiple now single-threaded OpenBLAS.

brada4 avatar Dec 25 '23 07:12 brada4

Hi,

How to use multi threads with openmp in OpenBLAS? Which configuration should I use?

Do you have demo configuration or app?

Thanks!

Billzhong2022 avatar Dec 25 '23 07:12 Billzhong2022

Call OpenBLAS from top level, not from within extra OpenMP pragmas? Should be obvious if you program OpenMP.

brada4 avatar Dec 25 '23 07:12 brada4

You can try experimenting with the sources in the cpp_thread_test directory.. if your code is calling BLAS functions from an OpenMP parallel region, OpenBLAS will currently use only one thread in each of the parallel calls.

martin-frbg avatar Dec 29 '23 13:12 martin-frbg