lapack icon indicating copy to clipboard operation
lapack copied to clipboard

Should lapack/lapack-sys depend on lapack-src?

Open cassiersg opened this issue 3 years ago • 9 comments

I have the following dependencies in Cargo.toml

lapack = "0.17.0"
lapack-src = { version="0.8.0", features=["openblas"]}

When I use functions from lapack in my crate, I get linker errors (undefined symbols). This can be solved by adding extern crate lapack_src as _; to my crate, but this behavior is a footgun, as debugging it requires understanding rust linkage model (and its interaction with cargo). As far as I understand it, the issue would be solved if an upstream crate would depend on lapack-src itself.

Could this be done here ? Or would it be better to do it in lapack-sys ?

cassiersg avatar Apr 23 '21 15:04 cassiersg

Hello, this is by design. The choice of the source is the end user’s responsibility:

https://github.com/blas-lapack-rs/blas-lapack-rs.github.io/wiki

IvanUkhov avatar Apr 24 '21 07:04 IvanUkhov

@IvanUkhov I agree that the choice of a particular implementation should be end-user's responsibility. However, any library that uses LAPACK symbols should declare dependency on LAPACK, otherwise linking might happen in the wrong order.

E.g., if I implement a library crate foo whose only dependency is the lapack crate (because foo is a library, I don't force the user to choose a LAPACK implementation). Then, an end-user depends on foo, and chooses to use OpenBLAS, so it specifies

foo = "0.1.0"
lapack-src = { version="0.8.0", features=["openblas"]}

in his Cargo.toml. Then, at the final link stage, rustc might link foo before lapack-src. Since the linker does not look ahead cross static libraries (which rlib are), this results in an undefined symol error.

As far as I understand https://github.com/blas-lapack-rs/blas-lapack-rs.github.io/wiki , the lapack-src crate is designed to solve this issue: if the lapack crate depends on lapack-src (but without enabling any "provider" feature), and user keeps the above Cargo.toml, then by feature unification, the lapack-src dependency of lapack will now provide LAPACK symbols (or more precisely, lapack-src will depend on openblas-src, which provides LAPACK symbols), hence linking will be in correct order.

Please not that lapack depending on lapack-src does not force an implementation onto the users: it works very well if the user chooses a LAPACK implementation that can be selected by lapack-src. But even if it is not the case (e.g. using libflame, which I'm doing now), it is still helpful as one can use the cargo patch feature to modify lapack-src.

I believe you don't see more reports like this one thanks to the fact that all the functions in lapack are marked #[inline(always)], which makes it such that linking works even if the lapack crate is linked while the LAPACK symbols are not already defined (as inline(always) functions are not linked in the .rlib). Any crate that directly depends on the lapack crate (and has not only #[inline(always)] functions) has then to depend itself on lapack-src (without any feature enable, if they are a library crate).

Further, in any case, lapack-src can be built without any "provider" feature enabled, in which case it is an empty crate. Therefore, I do not see any downside to adding the lapack-src dependency to lapack. Can you tell me if there is one I didn't see ?

I believe the current situation is quite confusing (at least to me), so I believe it would be best to make it simpler (such as: "implement a rust library -> depend on lapack", "implement a binary (or alike) => if you depend on any crate that uses LAPACK, you have to depend on lapack-srcwith one "provider" feature enabled"). If that is not possible, the story becomes this for library authors: "depend on bothlapackandlapack-src`, with not provider feature enabled), which seems to me unexpected and a bit strange. (In that case, I would be happy to contribute docs to make this more clear.)

cassiersg avatar Apr 24 '21 08:04 cassiersg

I should not have closed it without asking for more details. Sorry! I will keep it open to draw attention and get feedback.

Let me check my understanding. The bottom line is that the linkage order is likely to be incorrect whenever a library depends on blas and/or lapack but not on any src crate. In such cases, the inlining will be done in the library but the linkage will be delegated to the end user of the library and is not guaranteed to happen prior to the library. Is this a fair summary?

Here one can see all current dependents:

https://crates.io/crates/blas/reverse_dependencies

https://crates.io/crates/lapack/reverse_dependencies

And here is a concrete example (/cc @termoshtt):

https://github.com/rust-ndarray/ndarray-linalg/blob/master/lax/Cargo.toml

And another one (/cc @sebcrozet):

https://github.com/dimforge/nalgebra/blob/dev/nalgebra-lapack/Cargo.toml

Again, why would it not be sufficient to depend on a src crate in the library your develop?

IvanUkhov avatar Apr 24 '21 11:04 IvanUkhov

I should not have closed it without asking for more details. Sorry! I will keep it open to draw attention and get feedback.

Let me check my understanding. The bottom line is that the linkage order is likely to be incorrect whenever a library depends on blas and/or lapack but not on any src crate. In such cases, the inlining will be done in the library but the linkage will be delegated to the end user of the library and is not guaranteed to happen prior to the library. Is this a fair summary? This is a good summary, at least for what I understand of the issue.

Again, why would it not be sufficient to depend on a src crate in the library your develop? That would be sufficient (as long a lapack contains only #[inline(always)] functions). However, I find it not obvious to know that it should be done. One way to solve that would be to provide more extensive documentation (e.g. expanding https://github.com/blas-lapack-rs/blas-lapack-rs.github.io/wiki). Another way would be to make it simpler to do the right thing: make it such that only depending on lapack is enough for library crates.

Looking at the first (by download count) reverse dependencies of lapack, I see two patterns emerge

  • matrix: One feature to depend on lapack, another feature to depend on openblas-src. Enabling the first without the second would not be correct. (https://github.com/stainless-steel/matrix/blob/master/Cargo.toml)
  • peroxide: Does not use *-src crates, suggests custom dynamic linking of OpenBLAS: https://github.com/Axect/Issues/tree/master/Rust (https://github.com/Axect/Peroxide/blob/master/Cargo.toml)
  • lax: No dependencies on lapack-src but dependencies on openblas-src/netlib-src/intel-mkl-src/accelerate-src through features (https://github.com/rust-ndarray/ndarray-linalg/blob/master/lax/Cargo.toml)
  • mathru: Features openblas/netlib/... enable dependency on lapack-src and enable the corresponding feature of lapack-src (https://gitlab.com/matthiaseiholzer/mathru/-/blob/main/Cargo.toml)
  • nalgebra-lapack: Dependency on lapack-src, features to enable corresponding feature of lapack-src. (https://github.com/dimforge/nalgebra/blob/dev/nalgebra-lapack/Cargo.toml)
  • linxal: No dependency on lapack-src, features to enable dependencies on openblas-src or netlib-src. (https://github.com/masonium/linxal/blob/master/Cargo.toml)
  • numeric: No dependency on any src crate (https://github.com/numeric-rust/numeric/blob/master/Cargo.toml).
  • compute: Depends on either accelerate-src (on Mac OS), or openblas-src (not Mac OS). (https://github.com/al-jshen/compute/blob/master/Cargo.toml)
  • opensrdk-linear-algebra depends only on *-src for tests, suggests user to depend on lapack-src itself (and extern crate lapack_src). (https://github.com/OpenSRDK/linear-algebra-rs/blob/main/Cargo.toml)

As far as I understand:

  • peroxide, numeric and opensrdk-linear-algebra are basically unusable with static linking, due to the lack of dependency on any *-src crate.
  • matrix, linxal and compute are quite restrictive in which BLAS/LAPACK implementation they allow (while they seemingly only use "standard" LAPACK functions).
  • lax and mathru: allow usage of any currently-existing LAPACK *-src implementation on crates.io.
  • nalgebra-lapack: like above, but will, without any change would support a new LAPACK source if supported by lapack-src. (The openblas/... features are actually just commodities for the user, but could be removed without loss of functionality.)

Based on this, I draw two conclusions (this is just my opinion):

  • Separation of crates explained in https://github.com/blas-lapack-rs/blas-lapack-rs.github.io/wiki is very nice and flexible, but few crates in the ecosystem use it optimally (in my list, only nalgebra-lapack), that is, in such a way that the "end-user" can select the LAPACK backend, even if this backend is newly supported by lapack-src and the library (nalgebra-lapack) doesn't know about it. I believe this is a very interesting feature, and that we should strive for having it ecosystem-wide.
  • It is currently quite easy to not do the best thing, and documentation (at https://github.com/blas-lapack-rs/blas-lapack-rs.github.io/wiki) could be improved to direct more the user.

So, I would propose to

  • Establish best practices for library and "end-user" crates, and highlight them in documentation material.
  • (And maybe) make it simpler to do the best and working thing. Ideally, I would envision the guidelines to become:
    • If you write a library: depend on lapack (or lapacke, blas or cblas). (*)
    • If you are an "end-user" crate, depend on lapack-src and/or blas-src with the adequate provider feature enabled. (And ideally, if you don't, you would get a nice "No provider for LAPACK selected. See documentation at ..." error message.)

For the (*) point to work, I believe that it would suffice that lapack depend on lapack-src.

So this spans more than only the lapack crate. Sorry for not clearly explaining where I came from and why I was suggesting this feature. Thinking more about it, since this is quite a cross-cutting concern, maybe I should re-open this discussion somewhere else (please tell me, and give pointer to where is the good place).

For more context, I recently started to need linear algebra in my rust work (and went through quite a few hoops to get everything working). I am also investigating into the use of libflame as LAPACK implementation (I'll open-source a libflame-src crate if things work out), which explains my interest in the possibility of swapping providers.

BTW, I would also be interested in clarifying/discussing the versioning policy for the blas-lapack-rs crates. Where would be the best place for that ?

cassiersg avatar Apr 24 '21 20:04 cassiersg

It would be good to gather more feedback before we change anything.

@bluss, you were quite active during the initial design phase. Do you have any thoughts on the above?

IvanUkhov avatar Apr 25 '21 17:04 IvanUkhov

I don't see a problem with adding this "link" between the crates. blas-src/blas-sys should preferably work together in the same way. I'm not really qualified to talk about lapack. If the user needs to pick lapack-src or blas-src (one excluding the other), then maybe this "link" is not a good choice(?).

See our instructions for ndarray's blas integration - we have to remind about linking to blas-src (ndarray could link itself to blas-src, but with the current solution we don't have any strong coupling between ndarray and blas-src version, and we prefer that).

https://github.com/rust-ndarray/ndarray/tree/9f868f78cadbf3082d95964050266c94a8b5c610#how-to-enable-blas-integration

Edit: Oh and my experience stretches only to the -src and -sys packages - never used lapack/blas crates. Maybe @termoshtt has some input? (For ndarray we don't use the blas crate because we have more interesting memory layouts than just contiguous slices.)

bluss avatar Apr 27 '21 15:04 bluss

As far as I understand, ndarray of the issue I'm describing (although with blas rather than lapack): when the blas feature of ndarray is enabled, and the user additionally depends on blas-src, there could be linking order issues (with static linking, and only if some function of ndarray that uses blas is not inlined). Have you had any such bug report ?

Strong version coupling is indeed problematic, but it doesn't seem to be unsolvable. It seems to me that the blas-src/lapack-src should have fairly stable APIs: the API surface is the BLAS/LAPACK interface, and the set of Cargo features. Future extensions seem to be limited to adding new Cargo features, which doesn't break the API. Therefore, blas-src/lapack-src could decide to do only non-major releases for the foreseeable future, and the ecosystem can stabilize around those.

Your edit makes me thing that it would be better to add the dependency in the -sys crates, to cover such use-cases. This would also mirror the behavior of other -sys crates. E.g., openssl-sys takes care of linking to the system libraries, or, to compile openssl when its "vendored" feature is enabled.

cassiersg avatar Apr 28 '21 08:04 cassiersg

We haven't had any such bug report, but the current scheme is new (from 0.15.0), so it may take some time.

bluss avatar Apr 29 '21 07:04 bluss

Out of curiosity, I made a small test with ndarray:

  • one library crate (x) that depends on ndarray = { version = "0.15.0", features = ["blas"] } only (and does some non-generic matrix multiplications)
  • one binary crate (y) that depends on:
x
blas-src = { version = "0.8", features = ["openblas"] }
openblas-src = { version = "0.10", features = ["cblas", "static"] }

This two-crate setup is motivated by the fact that ArrayBase::dot is generic, and thus inlined at call-site. AFAIU, this is the recommended configuration for usage of ndarray in a library.

At first sight, this work, and using cargo +nightly rustc -- -Z print-link-args -C save-temps=y indeed shows that openblas is linked after x. However, cargo tree shows no dependency between x and openblas-src. As far as rust linkage order guarantees are concerned, I believe that linking order could go the other way around: the most accurate description of linkage order I found for rust is in https://github.com/rust-lang/rust/issues/41416#issuecomment-299344549 which seems to be still accurate to this day. Linking order between non-dependent rlibs seems to be unspecified (And I do not know what impacts it.).

If I put rlibs for libblas_src and openblas_src before x in the linker command-line, I get undefined references to BLAS symbols, as expected.

As a conclusion, it seems like the issue is real but hard to trigger.

cassiersg avatar Apr 29 '21 16:04 cassiersg