scikit-build-core icon indicating copy to clipboard operation
scikit-build-core copied to clipboard

Linking discrepancy compared to manual cmake and makefile invocation

Open SylviaZiyuZhang opened this issue 2 weeks ago • 1 comments
trafficstars

Hello!

I am having trouble using scikit-build-core for a C++ project with Intel MKL libraries which require specific linking behaviors to work. I am running into a situation where manually running cmake and make (with Unix Makefile) builds the library and the python module correctly, but installing via pip and scikit-build-core does not.

Details can be found at this repository. Via the manual build route specified in the repo README, both ctest and Python test for the python module run as expected. Via pip install -e ., the build experiences mkl issues:

/usr/bin/ld: /usr/lib/x86_64-linux-gnu/libmkl_intel_thread.a(slasr3_par.o): in function `mkl_lapack_slasr3':
      slasr3_omp_gen.f:(.text+0x1d7): undefined reference to `mkl_lapack_xslasr3'
      /usr/bin/ld: slasr3_omp_gen.f:(.text+0x281): undefined reference to `mkl_lapack_xslasr3'
      /usr/bin/ld: slasr3_omp_gen.f:(.text+0x4b77): undefined reference to `mkl_lapack_omp_parallel_enter'
      /usr/bin/ld: slasr3_omp_gen.f:(.text+0x4f43): undefined reference to `mkl_lapack_omp_parallel_exit'
      /usr/bin/ld: slasr3_omp_gen.f:(.text+0x5016): undefined reference to `mkl_lapack_omp_parallel_enter'
      /usr/bin/ld: slasr3_omp_gen.f:(.text+0x5353): undefined reference to `mkl_lapack_omp_parallel_exit
...

When using Ninja as the generator, the error occurs much earlier.

OS: Ubuntu 22.04.5 LTS CPU: vendor_id : GenuineIntel cpu family : 6 model : 85 model name : Intel(R) Xeon(R) Platinum 8176 CPU @ 2.10GHz

Thank you for your time!

SylviaZiyuZhang avatar Nov 04 '25 20:11 SylviaZiyuZhang

When you said you tested with plain CMake and it worked fine, can you be more explicit on the steps there? Did you do an install and tested from those artifacts or did you test it from build-dir artifacts (which would have the RPATH set).

There can be a number of issues of what is failing here and it would be hard to tell without a simple reproducer or ideally a showcase in the github actions, but a few red flags:

  • the diskann dependency does not cosume find_package(MKL) which can lead to transient link failures, which could be more prominent when linking everything statically at the end and you need everything to be visible
  • that library explicitly overrides its compiler which is incompatible with any non-C compiler (and even C compilers sometimes)
  • all of the linkages and builds that library does are incompatible for external usage as it does not define what libraries a dependent needs to inherit (everything is local only to the current build) so it cannot actually be properly consumed via add_subdirectory
  • I gave up on reading that CMake file because there are just too many issues that could be raised

Some things you can do:

  • try to make that dependency installable and consume the installed artifacts. That would segregate the issues coming from either builds
  • try building and linking as shared libraries and see where it fails then. You should be able to also more easily check what libraries it uses in the end and what symbols it actually needs
  • really consider how much effort you want to spend on it. My guess is that in the end you would have to completely rewrite that project's build-system to align with modern CMake designs, and are you willing to take that on?

LecrisUT avatar Nov 04 '25 21:11 LecrisUT

Thanks a lot for your thoughtful response!

To be more explicit about the steps, I have set up GitHub Actions for the repository for the succeeding path. The failing pip install case GitHub Action is having trouble accessing the g++ compiler and I haven't been able to troubleshoot that.

I believe many of the undesired patterns in the DiskANN CMake are due to dependencies on libraries like omp, mkl, blas etc. They are large and rely on specific linking to work (link line advisor) to the point that it may be desirable/necessary to link everything statically at the end in a not idiomatic way. It is also common to require a specific C compiler with these semi-legacy performance libraries. E.g. the link line necessary for clang may be different than that for g++, despite that the dependency libraries do not change.

I'm not sure what you were referring to by transient link failures. These issues are consistently reproducible

For the more concrete items you suggested

  • The build succeeds as a shared library. Installation/importing is however a nightmare.
  • I'm happy to spend as much effort on this as needed as it is in general desirable to be able to use libraries relying on these finicky scientific computing libraries with a modern Python ecosystem. I have not been able to find any consistent guidelines in this area. Many of these libraries are being rewritten but it won't happen instantaneously.
  • If need be, I can try to make the dependency installable / rewrite the build system. I'm just currently not sure where the scikit build path is having issues with / what would be the goal of rewriting since plain cmake and makefile work fine.

Thanks again!

SylviaZiyuZhang avatar Nov 06 '25 16:11 SylviaZiyuZhang

I'm not sure what you were referring to by transient link failures.

I mean issues where libA compiles with dependency of libfoo but when you are building libB that depends on libA the build can fail at link time if libA uses a symbol from libfoo and libB does not know that it also needs to link to libfoo. When linking against shared libraries you only need to care about public api symbols, but for static libraries it gets messy.

They are large and rely on specific linking to work (link line advisor)

Am aware of it, but that only applies when you compile manually. If you link to find_package(MKL) that is already handled on the CMake side. See the MKLConfig.cmake file for the options available. There are of course bugs in the MKLConfig.cmake last I checked (about 2 years ago), but it is easier to work around those bugs (involves fixing missing hints and variables) than to do the link commands manually.

It is also common to require a specific C compiler

Note that you cannot package it to pypi if you do that. If you build for your own environments then it's fine, but also you don't need to link statically in that case either.

To be more explicit about the steps, I have set up GitHub Actions

Great that shows quite a lot of the issues:

  • ninja: error: Makefile:5: expected '=', got ':' it is using ninja generator where elsewhere you made it use makefile. Try to use the default ninja only, otherwise pass the generator via CMAKE_GENERATOR env variable. Not sure why it used ninja generator there since we should be detecting it
  • more specific breakdown of the .so files that were built there and that you copied, together with the output of readelf -d would be useful
  • what targets were built in diskann and what types are they. Note that there are many ways to control if you build a library static/shared one of them being BUILD_SHARED_LIBS
  • we have not got to your original ld issue yet

I'm happy to spend as much effort on this as needed as it is in general desirable to be able to use libraries relying on these finicky scientific computing libraries with a modern Python ecosystem

Just wanted you to make sure you know what you're getting yourself into. Doing proper library work and development is good for the society and the well-being of science, but it is unappreciated by academia (that was my experience in my institute at least). If you have the liberty and devotion to do it though, it is very commendable and I will try to assist as much as possible, sharing whatever knowledge I picked up when I was on that journey also.

LecrisUT avatar Nov 06 '25 17:11 LecrisUT