warehouse
warehouse copied to clipboard
Feature request: "view source" tool for inspecting package contents
The recent event-stream problem on npm highlighted an issue that is also relevant to PyPI: even if a package links to a GitHub repository there is no guarantee that the code in the uploaded package matches the code in the repo.
One way this could be helped is for PyPI to provide a "view package contents" link next to each downloadable archive that opens a web interface for browsing the files in that package.
This could make it easier to spot deliberate exploits, but would also be a useful general feature for people who want to quickly understand more about the details of a package before they install it.
There are quite a few challenges in building such a feature.
For smaller packages, writing code which pulls a .tar.gz and turns it into a file listing / visible source code in response to an incoming HTTP request would be feasible (and highly catchable via varnish), but this probably won't work for larger files - pulling a 100MB .tar.gz and decompressing it on demand may not be feasible. Can we get numbers on the average size of packages and we how many outliers there are?
For those larger packages, maybe this will require extra processing on upload. This could be expensive in terms of both CPU and storage, and could open up zip bomb exploits if not implemented carefully.
One problem that is more specific to PyPI is that some packages can be uploaded in multiple formats - different wheels for example. Malicious code could potentially be hidden in just one of the wheel variants.
A really great implementation of this feature would also highlight differences between the contents of those different packages. This becomes not just a more complex implementation challenge but a UI design challenge as well.
... and while I'm throwing around crazy ideas: a really neat implementation of this would include a way to render diffsbetween different versions.
Now we are re-implementing a non-trivial portion of GitHub!
npm COO Laurie Voss says about this suggestion:
One issue you don't mention is that it creates a simply enormous vector for spam and a distribution mechanism for illegal content (such as various illegal forms of pornography). All the solutions I'm aware of for this problem involve expensive teams doing unpleasant jobs.
Merging duplicate issue https://github.com/pypa/warehouse/issues/7877 originally posted by @uranusjr
@uranusjr wrote:
What's the problem this feature will solve? @pfmoore, @pradyunsg and I were talking about dependency conflict debugging, and it came up as a topic that a significant number of users opt to “read setup.py” when they are looking for dependency information.
This is, however, currently quite awkward to do. The user either needs to find the project’s repository (e.g. GitHub) and hunt for the correct tag/commit, or download the distribution file and extract it manually. It is also difficult for pip to implement a feature to help with the process.
Describe the solution you'd like A view that allows the user to view the contents of a given distribution (wheel or sdist), and read the content of a given file in the archive. The viewer can be a bare minimal
text/plainpage, but some basic features like line numbers would be very nice to have.Each list view and file view should have a unique URL, e.g. I can paste a URL to the browser and read
setup.pyinDjango-3.0.tar.gz, or theMETADATAfile of a wheel directly. This would be valuable for sharing package information and help with user support.Additional context N/A
@di wrote:
Is the goal here to inspect the metadata for a release, or to view arbitrary files in a distribution?
If it's the former, this would be relatively simple to implement. We already have a similar view in the Admin UI:
![]()
If it's the latter, it's going to be quite a bit more challenging, as PyPI does not actually extract any files from the distribution archives or do any introspection of them.
@uranusjr wrote:
Metadata extraction would be very useful for wheels, but for source inspection is the only way to inspect an sdist. I think both are valuable features, but this issue is more about the latter.
We've got something like this now: https://inspector.pypi.io/
This isn't anything close to production-grade so I wouldn't recommend pointing a lot of traffic at it, but it provides a way to introspect packages on PyPI, without exposing PyPI to the need to introspect packages.
This isn't integrated into PyPI in any way except for the admin interface, but once it is a little more developed that could be possible.
We've got something like this now: https://inspector.pypi.io/
This is really neat, I love it! Exactly the kind of thing I was hoping for here.
If you're worried about traffic load on it, one alternative could be to implement the same thing entirely client-side. PyPI serves wheels etc with open CORS headers, so it's possible for JavaScript in a browser to fetch those packages, decode them and display them.
I built a very basic demo of that here: https://tools.simonwillison.net/zip-wheel-explorer
I think we're less worried about the traffic and more worried about the risk of extracting or displaying user-submitted content on the pypi.org domain. For example, your demo has an XSS vulnerability: try exploring this wheel and click on the __init__.py file.
With trusted publishing, would it be easy to have a link from pypi release to github commit tree? (I know pypi has info about the CI run, but don’t know how easily the git info can be gotten for the CI run)
Yep, see https://github.com/pypi/warehouse/issues/17122#issuecomment-2486926298
Triaging: I think this is complete per both https://github.com/pypi/warehouse/issues/5118#issuecomment-1168023698 and also the new UI view we have for attestation contents!