warehouse icon indicating copy to clipboard operation
warehouse copied to clipboard

Update roadmap with a clearly articulated security model & strategy

Open glyph opened this issue 6 years ago • 6 comments

What's the problem this feature will solve? Right now, PyPI has a way to report a security issue, but no clear description of what a "security issue" might be. Efforts like #5567 will improve the security of the site, but to what end?

Meanwhile, attacks against the open source supply chain are escalating, and more typo-squatting malware gets posted to PyPI every day.

Describe the solution you'd like

  • I'd like https://pypi.org/security/ to describe the threat model of PyPI and what properties it attempts to provide. In particular: what constitutes a security issue that should be reported
  • I'd like https://warehouse.readthedocs.io/security/ to describe what properties it would like to provide in the long term. Particularly, where do efforts like the TOTP work fit into a long-term vision for the security of the site and for its users?

glyph avatar Apr 18 '19 23:04 glyph

This evening I gave a talk to some students in an application security class, and figured my notes could be used to start addressing this issue.

The section headings are borrowed from the textbook The art of software security assessment: identifying and preventing software vulnerabilities by Mark Dowd (Chapter 4. Application Review Process):

  • General application purpose—What is the application supposed to do?
  • Fundamental security expectations—What security expectations do legitimate users of this application have?
  • Assets and entry points—How does data get into the system, and what value does the system have that an attacker might be interested in?
  • Components and modules—What are the major divisions between the application’s components and modules?
  • Intermodule relationships—At a high level, how do different modules in the application (within Warehouse) communicate?
  • Major trust boundaries—What are the major boundaries that enforce security expectations?

General application purpose: What is PyPI/Warehouse?

Glossary.

  • language-specific platform for sharing packages -- both libraries and applications
  • part of a toolchain; https://packaging.python.org/ covers the official open source tools for uploading and downloading (most people use PyPI by downloading via pip)
  • Since reads are much more common than writes (much more goes out than goes in), we try to cache as much as possible.
  • sdists and wheels -- we are indeed hosting binaries that we haven't inspected -- more at https://packaging.python.org/
  • History

Fundamental security expectations: Users and what they can do Reuse user classes from docs and owners vs maintainers.

How do you become one of these kinds of users? This is defined by project namespace. Initial project Owner is the first person to upload a project to PyPI with that project name.

What can these different owners do? See #5863 .

But also! ALL users, including people who are not logged in, can read the records of package activity.

Assets and entry points How does data get into the system, and what value does the system have that an attacker might be interested in?

  • API: Packages and projects get into the system via the API (users use Twine).
  • Web browser: Initial user creation, a lot of privilege creation/change/deletion, and the administrative interface

Components and modules

https://warehouse.readthedocs.io/application/ goes over this a bit.

  • Pyramid, our web application framework
  • Database access (we use SQLAlchemy and Postgres)
  • Auth
  • Token generation (Macaroons)

Major trust boundaries What are the major boundaries that enforce security expectations?

  • Login: API and browser-based
  • User privileges as defined in the database

brainwane avatar Nov 19 '19 00:11 brainwane

There are a few items in https://github.com/pypa/warehouse/issues/2794#issuecomment-368178296 that should also be in such a document, such as release immutability.

brainwane avatar Nov 19 '19 00:11 brainwane

In this discussion thread, @tiran says:

I would like to see a general and user-oriented PEP about PyPI security to answer these questions:

How is a package owner/maintainer able to verify that PyPI is serving correct and unmodified files? As a user of PyPI how can I make sure that pip installs correct and unmodified packages? As a user of PyPI how can I protect myself against typo-squatting attacks or compromised versions of a package?

and Donald Stufft notes,

this feels to me more like something that should be documented either on PyPI or as part of packaging.python.org.

I think documentation of the answers to those questions ought to be incorporated into the documentation push @glyph is suggesting.

brainwane avatar Jan 17 '20 19:01 brainwane

Thanks @brainwane

My thought provoking, inconvenient, and brutally honest opinion is: PyPI won't be able to deliver this in it's current shape and design. Sooner or later we have to consider a different model that works more like current app stores or Linux distributions. I'm talking about curated content.

I have been thinking about the matter for a while. All I have so far is a half-baked, handwavy proposal of a three layered index:

  1. Standard PyPI as it works today
  2. A filtered subset of PyPI that offers only projects that have gone through a review process.
  3. A subset of (2) that requires each upload, release, and uploader go through vetting and verification process.

Layer (2) should get rid of typo squatting. Layer (3) requires considerable effort but might be a way to generic revenue to support maintenance of PyPI and its tooling.

tiran avatar Jan 17 '20 22:01 tiran

PyPI is a publishing platform, not a curation platform, and building a language specific curation service doesn't make sense. It's unfortunate that Red Hat chose not to fund further work on https://fedoraproject.org/wiki/Env_and_Stacks/Projects/SoftwareComponentPipeline, but that's still well outside the scope of PyPI, and it's honestly well outside the scope of the PSF as well.

PyPI's job is to make sure that users can verify that what they installed is what the publisher uploaded.

Determining whether or not a particular publisher is trustworthy is a whole different story, and the onus for that will always remain primarily on consumers.

ncoghlan avatar Feb 06 '20 12:02 ncoghlan

Doing a bit of triage here:

  • https://pypi.org/security/ has a bit more detail on it now, although not a full threat model. This might be difficult to produce in a form that's both useful and suitable to the user-level PyPI pages (versus the Warehouse pages).
  • https://warehouse.pypa.io/security/ has some internal technical details on relevant security components of PyPI (e.g. PEP 740), but not much in the way of long-term aspirational properties (many of these are now achieved, including mandatory MFA, API tokens, misuse-resistant publishing, attestations, etc.).
  • Trusted Publishing has a security model here: https://docs.pypi.org/trusted-publishers/security-model/
  • Attestations has a security model here: https://docs.pypi.org/attestations/security-model/

woodruffw avatar Feb 13 '25 16:02 woodruffw