auditwheel icon indicating copy to clipboard operation
auditwheel copied to clipboard

Update libraries later if possible

Open oraluben opened this issue 7 months ago • 6 comments

Fixes: #561, https://github.com/pytorch/pytorch/pull/149471

Fixes that:

  1. When we have two dependencies: a.so -> b.so -> c.so and a.so -> c.so where b.so do not have rpath for c.so but a.so has, and if we process b.so first, c.so won't be found;
  2. The order to precess dependencies of a.so is random (for from a set): https://github.com/pypa/auditwheel/blob/ff5b63713a98182a5f4c97abb709f47bcc9e3bec/src/auditwheel/lddtree.py#L518

by:

  1. use sorted(set) and ordereddict to iterate through libraries deterministically
  2. try to update realpath for libraries if they're not found yet

Also fix an issue when we exclude based on path but excluded library may still present in DynamicExecutable.libraries with empty platform field, which breaks this check: https://github.com/pypa/auditwheel/blob/f5f83dd367acddd95005545af01bbe7d48064d38/src/auditwheel/wheel_abi.py#L278

oraluben avatar May 09 '25 09:05 oraluben

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 92.84%. Comparing base (93f45cc) to head (ef63c60).

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #582      +/-   ##
==========================================
+ Coverage   92.82%   92.84%   +0.01%     
==========================================
  Files          21       21              
  Lines        1771     1774       +3     
  Branches      333      334       +1     
==========================================
+ Hits         1644     1647       +3     
  Misses         77       77              
  Partials       50       50              

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codecov[bot] avatar May 09 '25 10:05 codecov[bot]

cython issue: https://github.com/cython/cython/issues/6861

oraluben avatar May 10 '25 15:05 oraluben

Thanks for digging into this. #586 will fix the non-deterministic search order & revert the behavior to what it was before. Would that be enough to fix the PyTorch issue (I certainly hope so since it seemed to be a regression in 6.3.0) ? If not, the way the "realpath" is updated is probably not enough w.r.t. excluded libraries (we'd probably need a second pass to filter once again previously analyzed shared objects after adding an exclusion).

mayeut avatar May 17 '25 13:05 mayeut

Would that be enough to fix the PyTorch issue (I certainly hope so since it seemed to be a regression in 6.3.0) ?

it's enough for upstream PyTorch but 6.3.0 also introduced a new issue which could be fixed by: https://github.com/pypa/auditwheel/pull/582/commits/eb8773aa97ba5fa676390502c7b8c128ae2e7273, i.e. this:

Also fix an issue when we exclude based on path but excluded library may still present in DynamicExecutable.libraries with empty platform field, which breaks this check: https://github.com/pypa/auditwheel/blob/f5f83dd367acddd95005545af01bbe7d48064d38/src/auditwheel/wheel_abi.py#L278

And after removing the library from _all_libs this should be fine?

(we'd probably need a second pass to filter once again previously analyzed shared objects after adding an exclusion)

I suspect a second pass may not be enough for general case. For example we have n libraries from a.so to n.so, a.so has rpath for all other libraries while all others do not. a.so -> b.so -> c.so -> ... -> n.so a.so -> c.so -> ... -> n.so ... a.so -> n.so

In this case we need a n-pass iteration to find all libraries unless we use a BFS search? I can imagine there's other issue. How's libc/ld handle this?

oraluben avatar May 18 '25 03:05 oraluben

a new issue which could be fixed

The need for this commit is based on realpath update. If we don't update the realpath later on, then platform should always be valid when realpath is valid (and we don't reach this assert if realpath is None). With the order fixed to respect DT_NEEDED entries, we're doing the same as ld which was broken in 6.3.0 What remains IMHO lies outside lddtree, that's the order in which we call lddtree on the various ELF objects.

mayeut avatar May 18 '25 07:05 mayeut

It should be fine then? The reason is I'm using sorted(needed) rather than the DT_NEEDED order, so now it should just work?

oraluben avatar May 19 '25 03:05 oraluben