CEP XXXX: Improving dependency export infrastructure
First draft after discussion in #77. Does not contain (much) specification yet, because I'm unsure how to go about changing the schema of v1 recipes (does it need a bump in the schema version, or do we specify build tools must translate between them?), and how to deal with the repodata side of things. This is my first CEP, please excuse my lack of experience with a lot of the underlying details.
Help on these questions would be much appreciated! I decided to write up the design in more comprehensive form than originally in this comment though, in order to hopefully facilitate more effective discussion of how to solve the transition issues posed by the new design.
Closes #77 (eventually)
pre-commit.ci autofix
pre-commit.ci autofix
pre-commit.ci autofix
because I'm unsure how to go about changing the schema of v1 recipes (does it need a bump in the schema version, or do we specify build tools must translate between them?)
Wolf mentioned in private that we don't necessarily have to go to a v2 schema over this, because despite being renamed semantically, the new keys would just be extending the v1 schema, not breaking it. Of course, we'd then have to mandate mutual exclusivity between exports: and run_exports: etc., but I think that's probably a gentler approach to this than taking this comparatively minor issue as cause for "recipe v2".
The same approach (consider the new keys if present, error if not mutually exclusive with the old way) could even by used by conda-build to support the CEP[^1], which would be great because a lot of our compiler feedstocks that would need this the most are not necessarily ready to be ported to v1 yet. :)
If people are in agreement over this approach, I can try to write up specification for it.
In any case, I think this is in a good enough state to ask for a first round of feedback; I'd be very curious to hear the thoughts of @chenghlee @isuruf @jezdez @msarahan @wolfv, people from mamba, prefix, cf/core, and anyone else interested in this!
[^1]: somewhat informally perhaps, since it's currently formulated atop of the v1 format.
Amazing work writing all this up. I am halfway through and I have the following comments:
- I was thinking it might be useful to have an
build_to_host_and_runas well (aka thestrongrun exports). Of course this can be achieved by adding the same run export tobuild_to_hostandbuild_to_runif desired. - I also thought that
noarch_to_runis maybe somewhat confusing. Maybeexport_when_noarchor something, as an alternative? But idk, always depends on good documentation anyways! On the other hand, now that I am thinking about it, might be interesting to think about it in terms of conditional dependencies (e.g.host_to_run: [foobar; if target_platform == "noarch"]would be a valid syntax under the new scheme). - The same goes for the
build_to_buildandhost_to_hostexports. This is going to be terrible to implement in this way as it is basically a conditional dependency and thus influences the solve. Implementing it naively is going to lead to a horrible experience. We need to use the conditional dependencies here and do something like a run dependency that looks like:
run:
- foobar; if in_host_environment
- barbaz; if in_build_environment
- barbaz; if env.HOST == "true"
- ...
Which would be something that we should add to the CEP about repodata v2.
Thanks for the feedback! :)
I was thinking it might be useful to have an
build_to_host_and_runas well (aka thestrongrun exports).
My current thinking is that there are quite few cases that actually need this; mostly it's a big hammer to achieve either build_to_host: or build_to_run:. So I have a preference to keeping them separate, and letting recipes be explicit in the cases where something needs to be exported to both.
I also thought that
noarch_to_runis maybe somewhat confusing. Maybeexport_when_noarchor something, as an alternative?
I only found out about noarch: run-exports while going through the CB code base for this CEP, so it's a bit of a late addition to the design, and perhaps I'm overlooking a more elegant option. That said, I kinda feel that "noarch is a state of ~mind~ building", and as such it still makes sense to plug into the pattern <condition>_to_<target>. But it's definitely the odd one out in any case, so I'm open for different names on this.
This is going to be terrible to implement in this way as it is basically a conditional dependency and thus influences the solve.
I don't see how it affects the solve? To my (limited) understanding of conda-build, there's already a separate pass to add run-exports before anything is being resolved. ~Also conceptually, you don't need to resolve anything to do build_to_build: injection, the only thing you need is to look up the fully static exports.json of the packages in the respective environment, and apply the right ones, before going to the resolver.~
Edit: Scratch that, that doesn't make sense (to look up the respective export.json files, you already need to resolve). 🤦
We need to use the conditional dependencies here and do something like a run dependency that looks like: [...] Which would be something that we should add to the CEP about repodata v2.
My first reaction to that is... not great, but perhaps I'm not seeing yet what conditions you see would make this necessary. I'm fully onboard with repodata v2 fixing a bunch of things, but at first glance, that recipe snipped looks like a solution in search of a problem to me. I don't see why I'd want to qualify run-dependencies w.r.t. which environment the package gets installed in. That sounds like a way bigger headache to me (in terms of increased complexity, especially on a conceptual level) ~than doing a pre-processing pass for exports that looks up some static data.~
That said, I'm not super-attached to build_to_build: and host_to_host:. I've included them as a favour to Isuru, but my main goal here is providing a clean design and solving the modules ABI situation. If people prefer to leave transitive exports for another CEP, that's also fine - at least the current design leaves the door open for a natural way to add them, even if it comes later.
Edit: Scratch that, that doesn't make sense (to look up the respective
export.jsonfiles, you already need to resolve). 🤦
I think there might be a way to keep it manageable, but it would require up to two more solves, one for build_to_build: (if present), and one for host_to_host: (if present).
- [as before] take
build:dependencies, resolve- record
build_to_build:,build_to_host:andbuild_to_run:of resulting packages (from named dependencies only)
- record
- [new] if any package in
build:has abuild_to_build:export:- replace the named dependencies in
build:with their exact pins, as determined by the first solve - add
build_to_build:exports & resolve; error if not possible anymore
- replace the named dependencies in
- [as before] take
host:dependencies +build_to_host:as new host env specification, resolve- record
host_to_host:andhost_to_run:of resulting packages (for named package in the above set only)
- record
- [new] if any package in
host:+build_to_host:has ahost_to_host:export:- replace the named dependencies in
host:+build_to_host:with their exact pins, as determined by the first solve - add
host_to_host:exports and resolve; error if not resolvable anymore
- replace the named dependencies in
- [as before] take
run:dependencies +build_to_run:+host_to_run:, resolve
This fixes a few possible issues that could otherwise arise:
- faithful to the spec "as if there were no exports" by pinning the results of the first solve (not of the entire environment, just the named dependencies)
- extra resolver cost is only "paid" when needed
- no backtracking, e.g. avoids situations where the original dependencies could be forced out by its own export[^1]
- capped at depth=1; avoids arbitrarily deep chains of host_to_host_to_host_to_host etc.
[^1]: and then causing another version of the primary dependency to be chose that doesn't have the export; this scenario can be constructed to be arbitrarily bad, including infinite oscillation
at this point you are just reinventing conditional dependencies in a worse way though :)
at this point you are just reinventing conditional dependencies in a worse way though :)
Which would run in a single solve and find a solution even if some exports might introduce conflicting dependencies!
Fair enough. You convinced me to descope this here. 😅
I can see that conditional dependencies are more flexible, but I don't see that as an unequivocally good thing. My first instinct would be to place strong constraints on what "conditions" you're are allowed to switch on.
I am curious what constraints you would want to impose. But maybe thats something we should discuss in the other CEP
I am curious what constraints you would want to impose. But maybe thats something we should discuss in the other CEP
Happy to discuss this in the other CEP
I had another think on this and while this could be solved as conditional dependency, it could still be listed under build_to_build. Since we have run-exports available from the repodata (shards) we could include them in the solve as well. It would be slightly less elegant vs. conditional dependencies from the implementation perspective of resolvo / rattler though.
Or we could normalize this in rattler build and turn these "exports" into conditional dependencies :)
Or we could normalize this in rattler build and turn these "exports" into conditional dependencies :)
Although my first reaction was somewhat reserved, I now think that conditional dependencies are probably a better fit for solving the use-cases that the X_to_X: exports were meant to address. :)
I had another think on this and while this could be solved as conditional dependency, it could still be listed under build_to_build.
We could add the following condition.
build_to_build condition has to match the run condition in the upstream package.
i.e. we need to have
requirements:
run:
- foo >=2
exports:
build_to_build:
- foo >=2
This would make us not have to run multiple solves.
We could add the following condition.
build_to_buildcondition has to match theruncondition in the upstream package.
Once the upstream package run:-depends on foo >=2, that export would be superfluous, as the dependency would come along for the ride in any environment where the package gets installed, including build:.
So I don't see how that would solve the "avoid build constraints at runtime" concern that was the motivation for the transitive exports. As I wrote above, I think this kind of thing would be solved more cleanly with conditional dependencies, i.e. bar: saying "if someone wants to compile against me (i.e. use me in host:), I need these additional things compared to my baseline dependencies", i.e. something like foo; if env in ["host"] (or however it ends up being spelled).
Once the upstream package run:-depends on foo >=2, that export would be superfluous, as the dependency would come along for the ride in any environment where the package gets installed, including build:.
No it doesn't come along in run. Let's say foo has a bar in host_to_host and bar has itself in host_to_run. Also assume that foo 1.0.0 *_1 has bar=1 dep and foo 1.0.0 *_2 has bar=2 dep. If foo was a C/C++ header using a C/C++ header in bar like @AntoinePrv said, we need bar=1 dep or bar=2 dep in the built package according to what version of bar was in host. This is what I call transitive exports and what I want to support.
So I don't see how that would solve the "avoid build constraints at runtime" concern that was the motivation for the transitive exports.
I don't understand this. Can you elaborate?
Currently run exports are only included from packages on which the recipe directly depends, so not for transitive dependencies but if I read the above discussion correctly we have to make an exception for build_to_build and host_to_host exports, is that correct?
No it doesn't come along in
run.
Can we agree that for an output bar: with your original
requirements:
run:
- foo >=2
# the below is not relevant for the question I'm asking
exports:
build_to_build:
- foo >=2
spec, the requirement foo >=2 is present wherever bar gets installed, irrespective of the type of environment?
Because you seem to be saying that's not the case, which goes against everything I've understood about how conda works.
If
foowas a C/C++ header using a C/C++ header inbarlike @AntoinePrv said, we needbar=1dep orbar=2dep in the built package according to what version ofbarwas inhost. I understand that case, and I've described it in the CEP to the best of my abilities (though in more general terms, because "version constraint" captures what you're describing aboutbarhere.
I agree that it's a valuable usecase, which is why I originally included it here. But I now think that exports are the wrong tool for this, and that conditional dependencies would be a better fit. This is because exports are fundamentally about cross-environment interaction. If you want to use it for intra-environment injections it massively complicates the job for the solver, the implementation, and the design (to find a reasonable way to control this).
In contrast, if you take conditional dependencies (syntax only for illustration, not intended as a concrete proposal), this works more naturally IMO
# output foo
requirements:
host:
- bar-devel
run:
# regular run-export from bar
# - bar
# additional requirement when compiling against foo; same effect as host_to_host
- ${{ pin_compatible("bar-devel") }}; if env.HOST
exports:
host_to_run:
- ${{ pin_compatible("foo") }}
This is what I call transitive exports and what I want to support.
Unless someone unexpectedly still comes up with a magic fix for the issues that have been identified with shoehorning this use-case into exports:, that will have to come through another CEP.
I don't understand this. Can you elaborate?
The way I think about the example you gave is that foo has an additional constraint (on the version of bar) that's only relevant at compile time. We don't want it to be present at runtime, because then the version of the bar headers is irrelevant and we don't want pointless conflicts. What I was saying is that I don't see how your suggestion to match build_to_build: with run: would achieve the "avoid compile constraints leaking into run:" part of the goal.
I don't know if conditional dependencies will be faster for the solver. After all it's NP-hard so three solves might not be slower than one with more constraints (to be fair I'm unsure about it either way). Also for most recipe that need this, solving time pales in comparison to compile time.
Another thing about the doing it with conditional dependencies do end up in the final package dependencies. So we would have to delay the adoption of this CEP until conditional dependencies are adopted, implemented by majors client, and a reasonable time is given to users to upgrade.
Looking at the recipe I don't think conditional dependencies do a good job at understanding what they are for. I also think being conditional on an environment variable is fragile (although that could be changed for some static key passed to the solver that is only set to true by conda/rattler passing it to the solver).
The X_to_Y is more descriptive in my opinion and does not prevent a future CEP to mandate that the build tool need to transform these keys into conditional dependencies.
Heck, even a ~conditional dependency enabled~ solver today could already read the relevant repodata entries and add them based on which environment they are creating no?
So we would have to delay the adoption of this CEP until conditional dependencies are adopted, implemented by majors client, and a reasonable time is given to users to upgrade.
The two features are not interdependent. This CEP stands alone and solves an important issue (host-exports; which is the reason the flang migration has been stuck for almost a year, dealing with C++ modules, etc.). Transitive dependencies can follow later, either through exports: or through conditional dependencies, but they certainly should not hold up this CEP.
Heck, even a ~conditional dependency enabled~ solver today could already read the relevant repodata entries and add them based on which environment they are creating no?
You don't know which repodata entries to read before solving the environment, and of course different variants of the same package can have very different exports (so you can't just guess in advance). Once you add the X_to_X: exports, your environment might change substantially (up to and including the invalidation of the exports you just applied!).
So it's not just a question of whether the process can be split into separate solves, it's that the solve results become very fragile to minor details, including how exactly that logic is implemented. It would be very hard to avoid degenerate corner cases IMO (non-convergence, oscillation, randomly unstable solves, etc.), and at the scale of conda-forge we're almost certain to hit them all.
I also think being conditional on an environment variable is fragile
This is exactly what host_to_host does though; it's conditional on a given environment type -- the two approaches are equivalent in intent, so the question is more about personal taste which concept feels more natural. However, the exports approach needs more solves, more logic and more complexity. So I now see several reasons (both conceptual and practical) that make me prefer conditional dependencies for this.
In any case, as the author of this CEP I'm making the decision to exclude the "transitive exports" use-case. It would have been nice to solve it en passant, but I'm not going to jeopardise solving the problem I set out to do for a bonus extension that's not such a natural fit after all.
I don't know why everyone is talking about conditional dependencies. This is not about adding a dependency to host if it's conda-build. This is about transitive exports. i.e. if foo is in host and bar is a dependency of foo, we need a way to add bar conditions in run. @wolfv, can you tell me why you think this has anything to do with conditional dependencies?
I don't know why everyone is talking about conditional dependencies.
Because adding dependencies to the same environment (e.g. from host to host) is 100% equivalent to a dependency that triggers under certain circumstances (like being in host).
if
foois inhostandbaris a dependency offoo, we need a way to addbarconditions inrun.
This is yet another case from what you've described before (a variant of host_to_run:, instead of host_to_host: as previously). Unless foo exports bar by itself, bar is completely invisible from the POV of the build that's consuming foo, and I definitely don't want to start applying any exports from packages not explicitly named in the recipe.
Please respect my decision that in this CEP, your use-case is out of scope. You can write your own CEP for that (either building on top of this, or competing with mine if you must).
You can write your own CEP for that
Sure
I read the whole thread and think the proposed solution is a good addition. (From the perspective of writing recipes.)
I read the whole thread and think the proposed solution is a good addition. (From the perspective of writing recipes.)
Thanks a lot for the feedback @cbouss, I appreciate you taking the time!
That reminds me that I wanted to post a comment[^1] from a recent discussion on zulip which covers an open point of the design here: how to deal with host_to_run: exports of packages that were themselves injected via build_to_host: (i.e. not explicitly named in the consuming recipe).
Concretely, if we have
# output: a_complicated_package
requirements:
exports:
build_to_host:
- some_package_with_a_run_export
and
# output: some_package_with_a_run_export
requirements:
exports:
host_to_run:
- the_export_in_question
and then consume it
# output: mypkg
requirements:
build:
- a_complicated_package
host:
# from a_complicated_package's build_to_host
# - some_package_with_a_run_export
run:
# ...should the run-export of some_package_with_a_run_export get triggered here?!
# - the_export_in_question
the question is whether, why and how we trigger an export of a package that's not explicitly named in the recipe. Obviously this is quite an impactful question, and we certainly do not want to trigger all exports of packages that happen to transitively make it into an environment.
On zulip, I discussed this in the context of how injected exports could/would work. Despite not wanting to cover self-exports in this CEP, I've used the motivating use-case for them as an example here, because it illustrates the situation more concretely than just some abstract arithmetic between environments.
As I note at the end, this CEP could go either way (i.e. whether mypkg gets a run-requirement of the_export_in_question or not). However, I'm trying to take a wider view of the design space here, and to surface potential problems early. The flip side is that it's possible to go too far ahead in the "what if" exercise and get bogged down without ever achieving the first step. In any case, I'd be interested what people think.
[^1]: I've edited the comment to leave out the bits not relevant here.
Let's take the clearest case that I'm aware of that's been in contention around self-exports vs. conditional dependencies. I'm rephrasing here for consistent language: libB depends on libA as usual, but compiling against libB needs the specific version of libA used to build libB, e.g. due to the way ABI and headers between libA and libB interact.
In other words, if libB 1.0.0 *_1 is built against libA=1 and libB 1.0.0 *_2 is built against libA=2, we need to match the libA version constraint of mypkg that (generically) depends onlibB with the specific constraints on the artefact of libB that was in host: at build time.
Let's look at the recipes:
# output: libA
requirements:
[...] # regular build: / host: / run:
exports:
host_to_run:
- ${{ pin_compatible("libA") }}
That's the easy part, which we already know as run_export: today. For libB, the recipe would use conditional dependencies
# output: libB
requirements:
[...] # regular build:
host:
- libA
run:
# regular run-export from libA
# - libA >={{ver_A}},<{{next_ver_A}}
# additional requirement when compiling against libB; same effect as host_to_host
- ${{ pin_compatible("libA") }}; if env.HOST # how to spell this is TBD!
exports:
host_to_run:
- ${{ pin_compatible("libB") }}
Now, whether as a conditional dependency or a host_to_host: export, using libB necessarily needs to inject a constraint on libA that's not present in the consuming recipe
# output: mypkg
requirements:
[...] # regular build:
host:
- libB
# injected!
# - libA # matches libA-constraint of libB
run:
# regular run-export from libB
# - libB >={{ver_B}},<{{next_ver_B}}
# run-export from injected libA!
# - libA >={{ver_A}},<{{next_ver_A}}
- some_regular_dep
In my mental model, the process is as follows:
- Solve
build:env - determine exports: for packages that have been named or injected in
build:, collect anybuild_to_host:orbuild_to_run:, call them B2H, B2R - Add B2H to
host:, solve - determine exports: for packages that have been named or injected in
host:, collecthost_to_run:, call them H2R - Save
run:+ B2R + H2R as run deps of the resulting artefact.
The "or injected" part is the most open aspect of the design space, but I believe the example with libA/libB above shows that we cannot rely only on named packages in such cases. So my proposed rule would be: "we apply exports from packages that are either explicitly named in the environment, or have been injected via exports or conditional dependencies". The second part of this rule only exists because there are cases that cannot be solved with just the first part.
Coming back to your example [for automatically injecting ${{ stdlib("c") }}], this could be written as
# mycompiler
requirements:
run:
- ${{ stdlib("c") }}
# conditional dependency when in build, which then participates in run-exports; same as build_to_build
- ${{ stdlib("c") }}; if env.BUILD
This would work with the "or injected" scheme above, but as you can see, it's neither very elegant nor obvious why you'd have to repeat the same dependency for mycompiler, and how that triggers the export from ${{ stdlib("c") }} in mypkg [from using mycompiler].
IMO that's because conditional dependencies and self-exports are features that are only truly needed for niche cases; any feature can be misused, so any additional expressivity needs to be guarded (e.g. by the linter etc.). As a consequence, I'm convinced that we shouldn't abuse this mechanism for something which every compiled recipe needs. In other words, ${{ stdlib("c") }} is too large a use-case to fit into this niche, and the consequences of trying to hide it are not worth the benefits.
This comment is already too long, so I'll just note that a more restricted form of [this PR] without the "or injected" is possible, and that this would already solve a lot of the cases where we need host-exports, e.g. the ABI of C++/Fortran modules. It's only when we get to stuff like the libA/libB case above that we really need to go beyond the "only named packages can contribute exports"; though I do think (compare the list of steps above) that the same mechanism would also be quite natural for chaining e.g. build_to_host: with host_to_run: exports.
Obviously this is quite an impactful question, and we certainly do not want to trigger all exports of packages that happen to transitively make it into an environment.
One thing that gnawed at me since posting that comment on zulip is that I don't like the relationship "conditional dependencies participate in run-exports", which seems too magical, and is by far not explicit enough in the recipe IMO.
However, I just had an idea that might solve this case: we can use export: host_to_host: ... as the syntax to indicate this unusual mechanism, but implement it using conditional dependencies. IOW, libB from above would look like
# output: libB
requirements:
exports:
host_to_run:
- ${{ pin_compatible("libB") }}
host_to_host:
# implemented not as an export, but as a conditional dependency of libB when it appears in `host:`!
- ${{ pin_compatible("libA") }}
That would IMO be the best of both worlds: explicit syntax for the most unusual case, as well as a sane environment resolution process, and conditional dependencies don't have to be imbued with some magical pixie dust (pun intended 😉). This would also simplify the rule I had posited above to "we apply exports from packages that are either explicitly named in the environment, or have been injected via exports ~or conditional dependencies~".
This would then give the following dependencies between the different proposals
---
config:
securityLevel: loose
---
flowchart TB
this["this CEP"]-->se["self exports CEP"]
cond["conditional dependencies CEP"]-->se
Nice that we are finding paths forward! Is this similar to the solution I proposed in https://github.com/conda/ceps/pull/129#issuecomment-3249504891, or does it differ in some way? If that’s the case, can you elaborate how it’s different?
Nice that we are finding paths forward!
Glad to hear it. It's not for lack of wanting to solve the issue that I had descoped self-exports; now I'm beginning to see a path that allows the various pieces to work together in a non-hacky way (where before I couldn't see it at all).
Is this similar to the solution I proposed in #129 (comment), or does it differ in some way? If that’s the case, can you elaborate how it’s different?
It's different in that we do not have to add a condition like "build_to_build: has to match run:", but we still maintain the fact that it doesn't require multiple solves.
I've been getting down into the nitty gritty details at least one level deeper (e.g. @jaimergp opened another rabbit hole under my feet about the mechanics of ignore_exports:, and I had an illuminating chat with @baszalmstra about the steps that (can) happen between the metadata and the resolver); I'm planning to expand the CEP with the design conclusions from those discussions, which should hopefully tell the whole story better than yet another very long comment.