repo2docker icon indicating copy to clipboard operation
repo2docker copied to clipboard

Interplay of conda-forge/broken channel and the channel ordering `strict`

Open betatim opened this issue 6 years ago • 3 comments

This issue is the place to discuss the interplay between having the conda-forge/broken channel available in repo2docker and the future change of conda to use a strict channel policy.

I'd encourage everyone who wants to participate in this discussion to read:

  • https://github.com/jupyter/repo2docker/pull/719#issuecomment-507592182
  • https://gitter.im/jupyterhub/binder?at=5d149da64e13324f8b9ba06a
  • repo2docker produces container images from repositories under the assumption that the same repo + commit hash will always produce the same final image. While we can't prevent people from putting just a package name in their list of dependencies (this building at different moments in time results in different images) the behaviour we want to encourage and consider a feature is that if you pin your dependencies very precisely a repo that used to build will continue to build and will produce the same image

This is what motivates adding the broken channel as a "channel of last resort" so that specifying foobar=1.2.3 which yesterday was in conda-forge but today was marked as broken will still produce an image today, instead of failing. In particular when you build images from things like a Zenodo record associated to a publication or some other "frozen in time" state you want to be able to rebuild the image from that moment in time, even if you know that today you'd choose other versions. A key exercise in going from one approved setup (say the one you published or your regulator/auditor signed off on) to a new one is to show that the same code in both environments produces the same results or different results and that you understand why that happens.


If there are other links, snippets, etc that people should read before jumping in please post them and I (or someone else with the right permissions on the repo) will update this comment so that we don't have to go in circles when new people join/get pulled in.

betatim avatar Jul 04 '19 07:07 betatim

Highlights from conda docs:

Three options:

  • channel_priority: false - all channels are created equal, whichever channel has the latest matching version wins (almost never desirable)
  • channel_priority: flexible - default in conda 4.x - channel priority takes precedence, but lower-priority channels can be considered in cases of unsatisfiability from the first-choice channel (as happens when a package is marked as broken)
  • channel_priority: strict - proposed default in conda 5.0 - if a package is found in a channel, no candidates are considered from any lower-priority channel. This effectively disables the broken channel as a lowest-priority fallback except for packages that have been removed from conda-forge and defaults entirely.

There are two main cases where the broken channel comes up:

  1. fully pinned environment, e.g. created by conda env export. This happens for the base env on all installs with repo2docker. This is where we've hit broken packages multiple times.
  2. merely version-pinned, but revoked or some dependency is marked broken at a later time. I'm not sure this has ever happened, but it could in theory. It might be that conda-forge should have a policy of making sure that at least one build of a given version of a package is available as not-broken.

I think strict channel priority is the best choice almost all of the time, but especially for first-time or loose installs. It just happens to be incompatible with conda-forge marking packages as broken somewhat liberally, combined with package pinning.

What would serve repo2docker best would probably be to first attempt an install with strict channel priority, followed by a second attempt with flexible only in the case of a failure to resolve the strict install. But I guess with priority as it is, this is equivalent to just using flexible, but faster most of the time. A more precise solution could be to only allow lower-priority channels in the event of exact-match pinning, e.g. if the requested spec includes build information, not just version information. This would solve the frozen-env case, but I would guess that this too complicated to implement in the solver.

It's also possible that the strict package filtering is occurring too soon. For instance, if it is doing:

  1. pkg=version=build is requested
  2. pkg is found on channel A, don't bother looking in channel B
  3. pkg=version is not found on A, fail.

That wouldn't be good. If, instead, only the first channel with at least one direct match is considered for pkg, then I think it would work. This is basically what I proposed as probably too complicated above, but might actually be feasible, depending on when channel filtering is evaluated. "Is it strict" seems less clear as an input than "is there any match" I think.

Yet another option would be to change how conda-forge marks packages as broken. If, instead of using labels to create a separate channel, packages could be marked as broken in a way that kept them in the same channel but deprioritized them in the solver some other way (I hesitate to mention features, but...hotfix features maybe?). This probably isn't feasible.

@scopatz is there a better place for a broader discussion of the relationship of strict channel priority / the conda-forge handling of 'broken' packages / pinned and frozen environments? On conda-forge or conda/conda?

minrk avatar Jul 04 '19 10:07 minrk

Sorry for the slow reply here @minrk - conda-forge is probably the best place (because everyone can see it). The biggest current discussion on strict is at https://github.com/conda-forge/conda-forge-ci-setup-feedstock/pull/50

scopatz avatar Jul 19 '19 19:07 scopatz

This issue has been mentioned on Jupyter Community Forum. There might be relevant details there:

https://discourse.jupyter.org/t/pass-channel-priority-strict-to-conda-with-repo2docker/10551/1

meeseeksmachine avatar Aug 29 '21 15:08 meeseeksmachine