arpack-ng icon indicating copy to clipboard operation
arpack-ng copied to clipboard

Build libarpack without linking to BLAS and LAPACK

Open DavidAce opened this issue 4 years ago • 20 comments

Purpose

Remove the need for BLAS and LAPACK when building libarpack. Only tests and examples need these dependencies.

This should fix issue https://github.com/opencollab/arpack-ng/issues/281

Reason

This would allow easier drop-in replacement of BLAS/LAPACK libraries without having to rebuild libarpack. Decoupling these dependencies would make it simpler to add arpack-ng to repositories for package manager such as conan while retaining the flexibility to swap the underlying BLAS library.

Detailed changes proposed

  • Add option(TESTS ... ) in CMakeLists.txt to enable/disable tests (default should be OFF in my opinion, but it has to be seen how this plays with the CI)
  • Only use find_package(BLAS) and find_package(LAPACK) if either TESTS=ON or EXAMPLES=ON
  • Use the variable BLA_STATIC which lets find_package honor the variable BUILD_SHARED_LIBS. This way static builds find static BLAS/LAPACK libraries (otherwise arpack risks injecting shared libs into static builds downstream)

DavidAce avatar Mar 05 '21 15:03 DavidAce

@fghoussen what do you think about this ? thanks :)

sylvestre avatar Mar 06 '21 13:03 sylvestre

I committed changes to arpack-ng-config.cmake.in that I think are related to the main theme of this pull request. This is to add blas and lapack as interface dependencies to the target ARPACK::ARPACK. I'd be happy to make a separate pull request if you think this is not related enough.

DavidAce avatar Mar 07 '21 13:03 DavidAce

@sylvestre : bad move IMO, what's the point ?! Anyway, CI must be set back up first.

fghoussen avatar Mar 07 '21 18:03 fghoussen

@DavidAce Nice work there.

Just a small question: what is the reason to use INTERFACE_SYSTEM_INCLUDE_DIRECTORIES in the .cmake.in (d0193eb)? The directory isn't added to flags.make until I change it to INTERFACE_INCLUDE_DIRECTORIES.

RyanBernX avatar Mar 12 '21 10:03 RyanBernX

what is the reason to use INTERFACE_SYSTEM_INCLUDE_DIRECTORIES in the .cmake.in (d0193eb)? The directory isn't added to flags.make until I change it to INTERFACE_INCLUDE_DIRECTORIES.

The point of INTERFACE_SYSTEM_INCLUDE_DIRECTORIES is to instruct CMake to use the compile flag -isystem <path-to-arpack/include> instead of -I <path-to-arpack/include>, when linking the installed target ARPACK::ARPACK. Using -isystem on imported targets will suppress compiler warnings from headers external to the consuming project (which the user can't do much about anyway).

But, can you give some more context? On which executable are you expecting flags.cmake to have this include?E.g. on one of the arpack examples, or your own project linking to ARPACK::ARPACK? Does it compile? Is it a Fortran project? CMake version? There is some discussion here about troubles using Fortran modules + -isystem. Is this applicable?

DavidAce avatar Mar 12 '21 10:03 DavidAce

But, can you give some more context? On which executable are you expecting flags.cmake to have this include?E.g. on one of the arpack examples, or your own project linking to ARPACK::ARPACK? Does it compile? Is it a Fortran project? CMake version? There is some discussion here about troubles using Fortran modules + -isystem. Is this applicable?

It's my own project (with cmake-3.16). Here is a minimal example:

cmake_minimum_required(VERSION 3.3)

project(test_arpack)

find_package(arpack-ng)

add_executable(test_arpack a.c)
target_link_libraries(test_arpack ARPACK::ARPACK)

Inspect CMakeFiles/test_arpack.dir/flags.make. You can see it doesn't have any include dirs:

# CMAKE generated file: DO NOT EDIT!
# Generated by "Unix Makefiles" Generator, CMake Version 3.16

# compile C with /usr/bin/cc
C_FLAGS =  

C_DEFINES = 

C_INCLUDES = 

However, change to INTERFACE_INCLUDE_DIRECTORIES yields:

# CMAKE generated file: DO NOT EDIT!
# Generated by "Unix Makefiles" Generator, CMake Version 3.16

# compile C with /usr/bin/cc
C_FLAGS =  

C_DEFINES = 

C_INCLUDES = -isystem /opt/ARPACK/include/arpack 

Strange though, it seems -isystem is also applied with INTERFACE_INCLUDE_DIRECTORIES. Am I missing something?

RyanBernX avatar Mar 12 '21 12:03 RyanBernX

I was able to reproduce it. It is strange indeed but it turns out the CMake docs which I linked to earlier mentions this specifically. They recommend to use the signature target_include_directories(SYSTEM) instead of setting the property directly. Changing this in .cmake.in fixed the problem on my end.

Also, I just noticed another issue: ARPACK::ARPACK should include <path-to-install>/include instead of <path-to-install>/include/arpack, at least from what I gather to be best practices. This would avoid header collisions, in particular with common header names such as debug.h. This should probably be in another pull request though.

DavidAce avatar Mar 12 '21 13:03 DavidAce

The new commit also works for me.

Also, I just noticed another issue: ARPACK::ARPACK should include <path-to-install>/include instead of <path-to-install>/include/arpack, at least from what I gather to be best practices. This would avoid header collisions, in particular with common header names such as debug.h. This should probably be in another pull request though.

Agree. Better to write #include <arpack/arpack.hpp> rather than #include <arpack.hpp>. However, changing this might break some compatibility as people used to write #include <arpack.hpp> in their codes.

RyanBernX avatar Mar 12 '21 14:03 RyanBernX

Agree. Better to write #include <arpack/arpack.hpp> rather than #include <arpack.hpp>. However, changing this might break some capability as people used to write #include <arpack.hpp> in their codes.

Agree too. IMO, can't afford backward compatibility (no pay here...) : people will have to move on and that's no big deal

fghoussen avatar Mar 13 '21 09:03 fghoussen

needs a rebase

dimpase avatar Oct 14 '21 09:10 dimpase

this needs a rebase and a push, to trigger, now working, CI.

dimpase avatar Nov 21 '21 11:11 dimpase

I do not want to merge this: bad move as already explained https://github.com/opencollab/arpack-ng/issues/281#issuecomment-673024199

fghoussen avatar Nov 21 '21 12:11 fghoussen

Though I only ended up here to report an issue that this branch fixes along the way, I have checked the code changes and thought I'd also leave a few cents.

Initially, I thought it was a pragmatic solution to an exotic problem. However, given that I have come to consider "pragmatic" an insult at work (the kind of seemingly tame solutions that cause terrible headaches later), I thought some more and would now also suggest to reject (most of) the pull request.

The problem (edit: with this pull request) is that on building a shared object, these changes create an invalid (!) libarpack.so as output, which depends on lapack/blas symbols, but does not declare a runtime dependency on the libraries. This invalid build output is then fixed up in the CMake config file. If you try to find_package(arpack-ng), the corresponding installed code will look for lapack/blas and declare it as a transitive dependency of the "arpack" CMake. In practice, this should work for simple ELF cases: The executable has a global list of all dependent shared objects and steps through this list whenever resolving a symbol, so it will resolve the blas/lapack symbols.

However, there is some potential for disaster:

  • Even when properly packaged, installed etc., you cannot simply link against the resulting libarpack.so, but must also link against blas/lapack as well, or use the CMake find_package for the fixup. Case in point: The pull request does not fix up the pkgconfig file. If you try autotools against the build output you should get errors.
  • There are some exotic cases where this global table may not have all dependencies, for example, plugin-heavy architectures that load lots of functionality through dlopen(). The dlopen()-opened library will only search its declared dependencies and those of the opening program. If none of these declares dependencies properly, things suddenly do not work.
  • I would not bet money on this fixup mechanism working well on Windows, though I would have to refresh my knowledge on DLL internals for definite claims.

Worst of all, these behavior changes trigger when you do something seemingly harmless like disabling tests. That is something I would strongly oppose; this functionality should be reachable through a (preferably advanced) special flag. But then if you are an advanced user, you probably know enough about CMake and good dependency handling (TM) that you do not need it. You could further restrict all the changes to static libraries only, but then it slowly becomes extremely special. Hence, KISS seems a valid objection.

To close with a positive note: @DavidAce : the fix for the config script, commit d0193ebf33647696fbadb4db2bd99ee642bc6c59 is something that I would endorse, and which also obsoletes pull request #311 , so irrespective of the outcome of this pull request, just having this fix (in another pull request?) would be worth a while.

wavepacket avatar Nov 21 '21 21:11 wavepacket

IMHO Cmake made a huge mess out of dynamic linking in general. E.g. they seem to be unaware of ldconfig - I have spent a lot of time recently explaining to a seemingly knowledgeable user of Cmake what it is for -- they all go like "just set LD_LIBRARY_PATH so that it includes /usr/local/lib".

dimpase avatar Nov 21 '21 22:11 dimpase

@wavepacket: before reading your answer, my position was:

  • Arpack needs BLAS/LAPACK (the core of it, not only the tests of it) : arpack's API are the *[ae]upd and git grep external SRC/*[ae]upd.f shows a bunch of hits! Arpack needs BLAS/LAPACK: it can not be used without it (this is basically the title of your PR... with no description of any associated real case crash/problem ?! No stack, no error message: just a few lines saying "I want that").

  • I provided CMake target mechanism (ff93cbc) to allow people using arpack from external CMakeList.txt to get CMake to reuse BLAS/LAPACK already found elsewhere: you can use that now to make things hopefully more logical and robust.

  • Arpack-ng is meant to provide a usable version of arpack: to reach this objective, arpack-ng must be and stay easy to maintain (no pay here but lots of problems). Arpack-ng is meant to allow people to easily use arpack (the reference code: results and performance) as it's a critical piece of code needed in many places to do many stuffs. Again, arpack-ng is not meant for development (no team and no pay behind) nor for fancy stuffs or ultimate coverage/testing. Arpack-ng is meant to be used by as anyone who needs it as easily as possible and on any old or modern arch (from cray to your laptop).

@wavepacket: you say you spend time and cents on this. Well, it happened to all of us (me included) on a regular basis, right? That how life goes some (bad) day. I am sorry you had to use the word insult (didn't get who triggered this), but you basically came here, without any detailed explanation, to force a questionable PR like an obvious evidence saying "I want to drive this car, but, with no wheel! Merge please!"... So maybe I had to answer back something you may have taken the bad way.

The problem is that ... create an invalid (!) libarpack.so ... you cannot simply link against the resulting libarpack.so, but must also link against blas/lapack as well,

So now you finaly explained the problem, I understand this indeed may hide some real problem! The problem would more sounds like "I want to drive this car, but, without 4 spare wheels, 1 is enough". Correct? If so, it indeed may be considered as a real problem. I guess, today, people do no more drive steamed-powered cars so they don't notice this problem anymore (even if this problem is real - I agree on that): everybody runs powerfull and fast cars today, right? Is that a problem? Yes and no. Yes for the philosopher who has time to think about who perfect the world should be, no for the always-late-and-in-the-rush guy that everybody always rely on. Once you have 4 spare wheels in your range-rover, you still have a lot of free space and power.

World is no perfect: you link twice to BLAS/LAPACK. But you always get something you can use. Is this the point?

Now, if I get this correctly, I am sorry to say that this sounds like a CMake problem (as I already explained): @wavepacket you may likely be right... But you should report that to CMake. Arpack-ng is not meant to cope this CMake problems. Put it another way: if you (or someone else) would provide one patch for CMake, all codes in all softwares in the whole world would be fixed instantly, right? But, if you patch arpack-ng, you still need to patch LAPACK (which uses BLAS), SCALAPACK (which uses LAPACK and BLAS), HDF5 (which uses zlib) and all softwares which rely on another one everyday everywhere?... Right?! Can you really say, that an million of patches is better than one in the right place: which sounds to be like CMake (and or autotools ?). This is a CMake problem.

I understand finally now a bit better the problem you point out, even if it likely sounds right, I believe the fix must not be done here, and the reasons why are:

  • Arpack-ng has not to cope with problems is not responsible for.

  • Avoiding potential regressions (in the sense "was working, but, it does no more"). If all other users (scipy, scilab, petsc, matlab, ...) get regressions (link KO) on many other different plateforms (from cray to android), I will have to cope with this: I can not afford that.

At this point of the discussion (unless I get things wrong ?): I will not merge this. But, I encourage you to report this to CMake (or ldconfig ?) as it sounds like a real problem (to fix elsewhere, not here).

In my opinion, arpack is an old walking dead software, considered as a reference by everybody, but that should be rewritten (with templates ?): everybody needs it but nobody wants to take time to rewrite it and revalidate it. This is why it must stay usable and maintainable. We maintain it to keep it usable: no more than that (not to make it perfect)... And it's already a huge thing.

fghoussen avatar Nov 24 '21 21:11 fghoussen

@fghoussen: This post probably does not matter, because we both agree that the pull request should essentially not be merged. And some parts of your response I can easily refute, for example the "cents" was just a reference to "my 2 cents", or take as yet another lesson to be more direct, like with the pragmatic thing (I tend to avoid "pragmatic solutions" like used in this pull request due to the issues I have seen them cause). But most of it I fail to understand. For example, I see no CMake problem anywhere, it is the code change in this pull request that explicitly enforces the undesirable behavior.

I merely wanted to add some more comprehensive explanation what the effect of this pull request is. Do you by chance confuse me with the creator of the pull request?

wavepacket avatar Nov 25 '21 21:11 wavepacket

Ah, I see what may have been the problem. The opening of the third paragraph in my original post should read "The problem with this pull request is". I'll change that.

wavepacket avatar Nov 25 '21 21:11 wavepacket

@wavepacket : I indeed confused you with the author of this patch!... Which seems to be @DavidAce

These are references to BLAS/LAPACK functions that are not needed at compile-time of SRC/*[ae]upd.f. They are not needed at link-time of libarpack.[a|so] either.

@DavidAce : as you say here https://github.com/opencollab/arpack-ng/issues/281#issuecomment-791914158, can you remove all external in the fortran files which are not used (in step-by-step commit-by-commit way)? And then, remove the unnecessary target_link_libraries and check that CI stays OK? @Fabian188: would this fix #281?

fghoussen avatar Nov 27 '21 12:11 fghoussen

Am 27.11.2021 um 13:37 schrieb Franck HOUSSEN @.***>:

@wavepacket : I indeed confused you with the author of this patch!... Which seems to be @DavidAce

These are references to BLAS/LAPACK functions that are not needed at compile-time of SRC/*[ae]upd.f. They are not needed at link-time of libarpack.[a|so] either.

@DavidAce : as you say here #281 (comment), can you remove all external in the fortran files which are not used (in step-by-step commit-by-commit way)? And then, remove the unnecessary target_link_libraries and check that CI stays OK? @Fabian188: would this fix #281?

Yes, from the description of the patch, it would fix the issue.

Thanks!

Fabian

Fabian188 avatar Nov 28 '21 12:11 Fabian188

Good! Sounds we have a deal if you restart all this from scratch, and, if you go this way instead:

  • Remove all external in the fortran files which are not used (in step-by-step commit-by-commit way - easier to follow-up)
  • And then only, remove the unnecessary target_link_libraries in CMakeLists.txt

This would remove unnecessary dependencies without triggering any regression.

fghoussen avatar Dec 01 '21 20:12 fghoussen