conda-forge.github.io icon indicating copy to clipboard operation
conda-forge.github.io copied to clipboard

What to do about clobbering of packages in the prefix

Open CJ-Wright opened this issue 5 years ago • 20 comments

What should we do when packages ship packages beyond themselves in site-packages. For instance if I have a package that ships itself and requests in site-packages that means that conda may no longer control what requests is installed, which could be problematic.

@conda-forge/core

edit: this effects the entire prefix and we should address all of that at once

CJ-Wright avatar Nov 18 '20 17:11 CJ-Wright

Here are two ideas for this.

Only Critical packages

  1. Determine a list of packages that considered critical infrastructure for security or other reasons.
  2. build up the mapping from files in the prefix to feedstocks
  3. add checks of this mapping somehow

All Packages

  1. Use libcfgraph to build up a mapping of possible files in the PREFIX to feedstocks
  2. have CI jobs compute any conflicts
  3. have CI jobs use their feedstock token to get the heroku server to post a lint comment on the PR

We could do both.

beckermr avatar Nov 18 '20 18:11 beckermr

On the technical side for the second thing, we store the mappings in a github repo and then make http requests to github to pull down json blobs with the data.

For the security-based one, we can use similar mechanisms but we need to do all of the unpacking and checking ourselves. The heroku server cannot support this load, but we can use cf-staging as a place to store validated packages before some other server pulls them to do the check.

beckermr avatar Nov 18 '20 18:11 beckermr

We could also make it easier for people in the test section of the recipe. Currently we often test for files using the commands test section which is really cumbersome and hard to get right. I am thinking about adding some macros to boa to check for the existence of a shared library (cross-platform), a python module in site packages, a cmake file etc.

If we design these macros nicely enough we could then warn about superfluous files ...

wolfv avatar Nov 18 '20 20:11 wolfv

warning is not enough. it's a security issue as well.

Also, adding stuff to boa is not going to be enough unless you would contribute those to conda-build.

isuruf avatar Nov 18 '20 20:11 isuruf

Sure, these could be good things to upstream. I hope to add package testing to boa next week

wolfv avatar Nov 18 '20 20:11 wolfv

We actually need this check in quetz to reject uploads of things that clobber. Anyone could simply remove a check in boa when building on a CI service.

beckermr avatar Nov 18 '20 20:11 beckermr

I think that removal depends on how it is implemented, if it is a check in the recipe then yes that is removable. If it is a check applied via the boa/conda-build source code that would be more difficult to skirt.

CJ-Wright avatar Nov 18 '20 20:11 CJ-Wright

How will conda-build know that a package is clobbering another?

isuruf avatar Nov 18 '20 20:11 isuruf

We can provide it with an API which returns if a file clobbers something already on conda-forge. I presume it has all the files in the prefix handy

CJ-Wright avatar Nov 18 '20 21:11 CJ-Wright

If it is a check applied via the boa/conda-build source code that would be more difficult to skirt.

more difficult is not good enough here if we are calling this a security issue.

beckermr avatar Nov 18 '20 21:11 beckermr

You would have to edit the source code of conda-build/boa itself. I'm not certain if there are schemes to do that in a recipe but we could (and most likely should) have the build system hash itself when performing sensitive tasks. That way we can check if the build system was modified in some way during the build.

CJ-Wright avatar Nov 18 '20 21:11 CJ-Wright

Ok there are two aspects to this: one is securing against malicious packages and the other is to make it easier for packagers to not unintentionally clobber. But yeah, I would be happy to help with development wrt to quetz as well! We're working really hard on it right now and with the plugin system in place we could totally think about rejecting uploads like this, and indexing all files in a database.

wolfv avatar Nov 18 '20 21:11 wolfv

We're working really hard on it right now and with the plugin system in place we could totally think about rejecting uploads like this, and indexing all files in a database.

Yup. This coupled with package signing and a way for us to administer the outputs clobbering checks we already do would help us centralize our infrastructure and possibly replace cf-staging with something less hacky and more secure.

beckermr avatar Nov 18 '20 21:11 beckermr

Something that may look like clobbering but is not really is when a package get split into two. For example foobar-static may appear to clobber files from an earlier version of foobar (from before splitting static libraries).

SylvainCorlay avatar Nov 18 '20 21:11 SylvainCorlay

Yes, we observe that with matplotlib vs. matplotlib-base so we'll need to handle exceptions to our "no clobbering" policy. We'll also need to handle cases where clobbering is expected, in the case of direct API/ABI replacement.

CJ-Wright avatar Nov 18 '20 21:11 CJ-Wright

One way to address this would be to

  • Build a mapping of files to packages.
  • Use this information to display the clobbering in the quetz front-end when inspecting a package.

The maintainers of e.g. the Python package would probably be worried to see some random package clobber python files, while matpltolitb feedstock maintainers would not.

SylvainCorlay avatar Nov 18 '20 21:11 SylvainCorlay

We will have to special-case these things like we do for outputs. Again if we are going to approach this as a security issue then simply displaying things in a front-end is not sufficient.

beckermr avatar Nov 18 '20 21:11 beckermr

I have written down my ideas for easier file-existence tests here: https://github.com/mamba-org/boa/issues/93

I'd be happy about input. I think adding this to conda-build wouldn't be too hard (I just finished the port of the test-running capabilities to boa, so I think that should be OK).

wolfv avatar Nov 20 '20 14:11 wolfv

Following up on this old thread. Conda does have a way of converting clobber warnings to errors via the path_conflict value in condarc

If we were ok letting feedstocks configure this as an error (like we do with linking checks), this would provide feedstocks a way of fixing clobbering issues and then enabling this configuration parameter to raise errors instead (to avoid regressions)

Proposed a way we might allow some condarc keys in conda-forge.yml with PR: https://github.com/conda-forge/conda-forge-ci-setup-feedstock/pull/364

jakirkham avatar Nov 06 '24 22:11 jakirkham

That's part of it, but only local to the packages co-installed at test time. If we want global checks, I think we could leverage the conda-forge-paths database dumps (this is effectively what the files-to-artifacts mapping people were referring to) and run some queries in CI. Possibly in a post-verify step or something.

jaimergp avatar Nov 07 '24 11:11 jaimergp