What to do about clobbering of packages in the prefix
What should we do when packages ship packages beyond themselves in site-packages. For instance if I have a package that ships itself and requests in site-packages that means that conda may no longer control what requests is installed, which could be problematic.
@conda-forge/core
edit: this effects the entire prefix and we should address all of that at once
Here are two ideas for this.
Only Critical packages
- Determine a list of packages that considered critical infrastructure for security or other reasons.
- build up the mapping from files in the prefix to feedstocks
- add checks of this mapping somehow
All Packages
- Use libcfgraph to build up a mapping of possible files in the PREFIX to feedstocks
- have CI jobs compute any conflicts
- have CI jobs use their feedstock token to get the heroku server to post a lint comment on the PR
We could do both.
On the technical side for the second thing, we store the mappings in a github repo and then make http requests to github to pull down json blobs with the data.
For the security-based one, we can use similar mechanisms but we need to do all of the unpacking and checking ourselves. The heroku server cannot support this load, but we can use cf-staging as a place to store validated packages before some other server pulls them to do the check.
We could also make it easier for people in the test section of the recipe. Currently we often test for files using the commands test section which is really cumbersome and hard to get right. I am thinking about adding some macros to boa to check for the existence of a shared library (cross-platform), a python module in site packages, a cmake file etc.
If we design these macros nicely enough we could then warn about superfluous files ...
warning is not enough. it's a security issue as well.
Also, adding stuff to boa is not going to be enough unless you would contribute those to conda-build.
Sure, these could be good things to upstream. I hope to add package testing to boa next week
We actually need this check in quetz to reject uploads of things that clobber. Anyone could simply remove a check in boa when building on a CI service.
I think that removal depends on how it is implemented, if it is a check in the recipe then yes that is removable. If it is a check applied via the boa/conda-build source code that would be more difficult to skirt.
How will conda-build know that a package is clobbering another?
We can provide it with an API which returns if a file clobbers something already on conda-forge. I presume it has all the files in the prefix handy
If it is a check applied via the boa/conda-build source code that would be more difficult to skirt.
more difficult is not good enough here if we are calling this a security issue.
You would have to edit the source code of conda-build/boa itself. I'm not certain if there are schemes to do that in a recipe but we could (and most likely should) have the build system hash itself when performing sensitive tasks. That way we can check if the build system was modified in some way during the build.
Ok there are two aspects to this: one is securing against malicious packages and the other is to make it easier for packagers to not unintentionally clobber. But yeah, I would be happy to help with development wrt to quetz as well! We're working really hard on it right now and with the plugin system in place we could totally think about rejecting uploads like this, and indexing all files in a database.
We're working really hard on it right now and with the plugin system in place we could totally think about rejecting uploads like this, and indexing all files in a database.
Yup. This coupled with package signing and a way for us to administer the outputs clobbering checks we already do would help us centralize our infrastructure and possibly replace cf-staging with something less hacky and more secure.
Something that may look like clobbering but is not really is when a package get split into two. For example foobar-static may appear to clobber files from an earlier version of foobar (from before splitting static libraries).
Yes, we observe that with matplotlib vs. matplotlib-base so we'll need to handle exceptions to our "no clobbering" policy. We'll also need to handle cases where clobbering is expected, in the case of direct API/ABI replacement.
One way to address this would be to
- Build a mapping of files to packages.
- Use this information to display the clobbering in the quetz front-end when inspecting a package.
The maintainers of e.g. the Python package would probably be worried to see some random package clobber python files, while matpltolitb feedstock maintainers would not.
We will have to special-case these things like we do for outputs. Again if we are going to approach this as a security issue then simply displaying things in a front-end is not sufficient.
I have written down my ideas for easier file-existence tests here: https://github.com/mamba-org/boa/issues/93
I'd be happy about input. I think adding this to conda-build wouldn't be too hard (I just finished the port of the test-running capabilities to boa, so I think that should be OK).
Following up on this old thread. Conda does have a way of converting clobber warnings to errors via the path_conflict value in condarc
If we were ok letting feedstocks configure this as an error (like we do with linking checks), this would provide feedstocks a way of fixing clobbering issues and then enabling this configuration parameter to raise errors instead (to avoid regressions)
Proposed a way we might allow some condarc keys in conda-forge.yml with PR: https://github.com/conda-forge/conda-forge-ci-setup-feedstock/pull/364
That's part of it, but only local to the packages co-installed at test time. If we want global checks, I think we could leverage the conda-forge-paths database dumps (this is effectively what the files-to-artifacts mapping people were referring to) and run some queries in CI. Possibly in a post-verify step or something.