pydistcheck icon indicating copy to clipboard operation
pydistcheck copied to clipboard

[bug] "compiled" file count in `--inspect` misses static libraries

Open jameslamb opened this issue 1 year ago • 4 comments

What did you expect to happen?

The output of pydistcheck --inspect prints the number of compiled files within a distribution.

https://github.com/jameslamb/pydistcheck/blob/471660ea780c9e65ab1e90b3e9f694e6cbea8bbe/src/pydistcheck/inspect.py#L26

That count appears to miss static libraries.

What actually happened?

Static libraries should be counted in the count of compiled files.

How can someone else reproduce this problem?

Consider the following.

docker run --rm -it python:3.12 bash

pip download \
    --no-deps \
    --extra-index-url https://pypi.nvidia.com \
    'librmm-cu12==24.10'

That project has some static library files.

unzip -l ./librmm_cu12*.whl | grep -E '\.a'
#   248882  2024-10-09 14:39   librmm/lib64/libfmt.a
#  1802910  2024-10-09 14:39   librmm/lib64/libspdlog.a

But those are not reported as "compiled" by pydistcheck --inspect.

pip install 'pydistcheck>=0.8.0'
pydistcheck --inspect ./librmm_cu12*.whl

You'll see output that begins like this:

checking './librmm_cu12-24.10.0-py3-none-any.whl'
----- package inspection summary -----
file size
  * compressed size: 3.7M
  * uncompressed size: 15.3M
  * compression space saving: 76.1%
contents
  * directories: 0
  * files: 1770 (0 compiled)

I'd expected that (0 compiled) to actually say (2 compiled).

What version of pydistcheck are you using?

0.8.0

Notes

No response

jameslamb avatar Nov 12 '24 15:11 jameslamb

Some helpful links:

  • https://stackoverflow.com/a/41902135/3986677
  • https://stackoverflow.com/a/60909689/3986677
  • https://en.wikipedia.org/wiki/Ar_(Unix)

jameslamb avatar Nov 29 '24 03:11 jameslamb

I was able to trace the bug to the fact that _FileFormat does not support the Unix archive format yet, and I added support for it – I might have a fix handy soon. Would you be interested in a PR? :D

agriyakhetarpal avatar Apr 03 '25 04:04 agriyakhetarpal

BTW, I tested the fix on librmm_cu12-24.10.0-py3-none-any.whl from the reproducer above, and I noticed that pydistcheck reports that librmm/lib64/libfmt.a and librmm/lib64/libspdlog.a have debug symbols. However, I don't see anything with grep "debug" when I run llvm-objdump or llvm-nm over them.

Perhaps pydistcheck is incorrectly reporting that they have debug symbols when they don't, or vice versa? I don't have a Linux machine or Docker installed at the moment to check this particular wheel out. I do notice that the _nm_reports_debug_symbols function only checks if exported_symbols != all_symbols, but that condition may not translate into a high-confidence check where we can say with certainty that debug symbols are found in the binary. Either way, that should be a separate issue.

agriyakhetarpal avatar Apr 03 '25 04:04 agriyakhetarpal

I was able to trace the bug to the fact that _FileFormat does not support the Unix archive format yet, and I added support for it – I might have a fix handy soon. Would you be interested in a PR?

Sure, that'd be great! I'd welcome a PR adding that support.

I noticed that pydistcheck reports that librmm/lib64/libfmt.a and librmm/lib64/libspdlog.a have debug symbols

Very possible that the nm check you're highlighting is not a great check, and is giving us a false positive here.

I'd want to dump the entire symbol tables for those objects with readelf or similar and check if I agree with the findings from it. Agree that it should be a separate issue.

jameslamb avatar Apr 03 '25 15:04 jameslamb