tribe icon indicating copy to clipboard operation
tribe copied to clipboard

Conan 2.0 requirements effect in ``package_id``

Open memsharded opened this issue 3 years ago • 21 comments

Conan 2.0 proposes a new package_id that takes into account the different scenarios that can happen when building and linking C and C++ native binary artifacts like executables, static libraries, shared libraries, etc.

This will be done differentiating 2 major scenarios:

  • "embed" cases, when the dependency artifact will be embedded in the consumer artifact, like when an executable links a static library. This case will default to use a full_mode policy (old recipe_revision_mode), which guarantees that every change in the dependency requires a re-build in the consumer.
  • "non embed" cases, where the dependency artifact is in general not embedded/inlined in the consumer artifact, like when a static library depends on another static library or a shared library depends on another shared library. This case will default to use a minor_mode policy, which fully allows the developer to define what needs to be done (not bumping the version or bumping the patch, means no need to rebuild the consumers, bumping the minor means consumers should be rebuilt and bumping the major means that there are breaking changes requiring the consumers to change their code to adapt).

  • Upvote 👍 or downvote 👎 to show acceptance or not to the proposal (other reactions will be ignored)
    • Please, use 👀 to acknowledge you've read it, but it doesn't affect your workflow
  • Comment and reviews to suggest changes to all (or part) of the proposal.

memsharded avatar Jun 17 '22 11:06 memsharded

As long as both cases are default configurable, I'm 👍

FnGyula avatar Jun 17 '22 12:06 FnGyula

As long as both cases are default configurable, I'm

Sure, if you check the full text, there are 5 different core.package_id configurations, including the embed, non-embed, unknown, build and python cases, that can be configured in global.conf

memsharded avatar Jun 17 '22 12:06 memsharded

I'm strongly in favor of this proposal, but I also think that the chosen "use-case" should also affect the visibility of the public headers, as described in this old issue.

Notably, let's have 3 static library packages Aaa, Bbb and Ccc with dependency graph Ccc -> Bbb -> Aaa.

Let's also say that public headers of Bbb include and use public headers of Aaa. This means that if some package Ddd, that depends on Ccc, overrides the version of Aaa to a new, breaking, version, Conan needs to rebuild both Bbb and Ccc, as implementation files (.cpp files) of Ccc may have inlined code from public headers of Aaa, even though they never directly included them - the include was transitive via public headers of Bbb.

My suggestion in the mentioned issue was to also let Conan generators be aware of that, and prevent Ccc from even seeing headers from Aaa unless Bbb declares that it uses Aaa as in "embed use-case". Even though the "embedded use-case" is meant mostly for applications and dynamic/shared libraries, I would argue that it should also be used for head-only and static libraries whose public headers are used/"embedded" by the public headers of the consuming library.

So, by using syntax from the proposal, in the example above, the recipe of Bbb would state something like this:


class BBBConan(Conanfile):

    def requirements(self):
        self.requires('Aaa/1.0.0@user/channel', package_id_mode='minor_mode')

WIthout stating that, Ccc (the user of Bbb) would not get the header search path for Aaa, thus making a compilation for Ccc fail if Bbb mistakenly uses public headers of Aaa in its public headers.

DoDoENT avatar Jun 17 '22 14:06 DoDoENT

is / will there be a default mode if you set / select nothing ?

(I hope the answer is yes, full_mode is the default :-)

a4z avatar Jun 17 '22 14:06 a4z

@DoDoENT

WIthout stating that, Ccc (the user of Bbb) would not get the header search path for Aaa, thus making a compilation for Ccc fail if Bbb mistakenly uses public headers of Aaa in its public headers.

I am not sure if I follow. This was already approved in https://github.com/conan-io/tribe/blob/main/design/026-requirements_traits.md, and it has been implemented. Now in Conan 2.0 (released alpha), ccc package does not have visibility over aaa headers, unless bbb explicitly defines it with the appropriate trait self.requires("aaa/1.0", transitive_headers=True).

memsharded avatar Jun 17 '22 14:06 memsharded

@a4z

Yes, it is in the text of the proposal:

  • core.package_id:default_non_embed_mode, default=minor_mode
  • core.package_id:default_embed_mode, default=full_mode

The default for the non-embed mode is select to be minor_mode, for several reasons:

  • It allows user control, based on versioning for the non-embed case
  • It allows ConanCenter to be operational, otherwise, every single change to any recipe in ConanCenter would render all the dependant recipes binaries obsolete, requiring to rebuild everything all the time, and that is not viable. ConanCenter running in 1.X with "semver_direct_mode" for all cases has proved that is not that strictly necessary to rebuild absolutely always.
  • Users that want to change it, just need 1 line in their global.conf

memsharded avatar Jun 17 '22 15:06 memsharded

I am not sure if I follow. This was already approved in https://github.com/conan-io/tribe/blob/main/design/026-requirements_traits.md, and it has been implemented. Now in Conan 2.0 (released alpha), ccc package does not have visibility over aaa headers, unless bbb explicitly defines it with the appropriate trait self.requires("aaa/1.0", transitive_headers=True).

@memsharded , I'm aware of that. I'm just saying that that proposal should come hand-in-hand with this proposal here.

Notably, if you write self.requires('aaa/1.0', transitive_headers=True), it's package_id_mode should default to minor_mode (or whatever is chosen for the "embedded use-case"). Of course, it would still be possible to write self.requires('aaa/1.0', transitive_headers=True, package_id_mode='semver_mode') when you know what you are doing, but at least it would not be the default.

My guess is that, by default, when you opt-in for transitive_headers, you automatically get into the "embed use-case" scenario.

I could be wrong, though,

DoDoENT avatar Jun 17 '22 15:06 DoDoENT

but how is the selection between embed and non embed mode done ?

a4z avatar Jun 17 '22 15:06 a4z

@DoDoENT

My guess is that, by default, when you opt-in for transitive_headers, you automatically get into the "embed use-case" scenario.

Not really. If you have both bbb and aaa that are static libraries and bbb requires aaa with transitive_headers=True, that doesn't imply that bbb will be inlining/embedded the aaa artifacts, in fact it could mean the opposite, that it is less of an implementation detail of bbb and be just transitive over the public headers, but not used in the .cpp at all.

And for the consumer ccc of bbb, that depends on the type of consumer. If it is an application, it will certainly use the embed case both over bbb and aaa.

Also, the most important, the non-embed doesn't imply that nothing will be ever embedded. It is implying that it is not always embedding the changes. But the user has the control to specify it, bumping the patch (or not bumping), means that only .cpp files were modified, it doesn't matter that headers would be embedded in consumers. If a change is done in a public header, then users can do a bump to the minor version to indicate it, and that will trigger the signal that something that might be embedded by the consumers has changed, and as such, it is necessary to rebuild consumers.

memsharded avatar Jun 17 '22 15:06 memsharded

My other concern also relates to the combinations of embed vs transitive_headers requirements, plus compatible_packages. I find it quite common that shared library has a lot of implementation details which are both embedded inside of it (via static or header libraries) which are not exposed in its public API. Today that's not much a of problem; such requirements just capture the major version and that rarely changes. This of course means the package_id is quite inadequate to achieve reproducibility, which is presumably what this proposal is aimed at - it will make package_ids much more specific than currently are.

I'd like to view that as a good thing, because in terms of reproducibility it definitely is. But as-is exposing all these implementation details is either going to cause a lot of proliferation of package binaries, or (more likely) it's just going to result in recipes routinely defeating the new mechanism by deleting hidden+embedded requirements from their package_id to restore the status quo.

This feels like the use case compatible_packages was intended for. But it will be very difficult for a recipe to list every possible permutation of full_mode package_ids for all of its requirements, and outright impossible for it to cover future versions of those requirements produced after the recipe was uploaded.

What I think would work well is if the package_id had all the details for reproducibility, but the recipe could use self.compatible_packages.append to offer additional package_ids where some of these requirements (that are private, not subject to one-definition-rule, and thus interchangeable) were deleted. The problem is that currently compatible_packages only operates in the consumer, and so no "reduced" package_id like this would ever match anything in the cache or server. What could really be neat is if compatible_packages offered by the recipe while producing a package were also recorded, allowing the consumer<->producer roles (of the same recipe) to try and meet in the middle at some reduced package_id that they agree constitutes a compatible substitution. This would likely require the binary cache to have some kind of alias package concept (like recipes already do).

Another important detail is that some requirements might frequently not even be resolvable in the consumer's profile - e.g. a mingw-gcc or C# executable which requires a shared library built in msvc, which in turn requires some other static/header-only library that doesn't support gcc (or vice-versa). But if the interface is suitably tight, the compatible_packages might meet in the middle at "os=Windows, arch=...". This is the same kind of question that comes up all the time for tool_requires, where today's solution is generally to produce a very minimized package_id (that is thus not at all sufficient for reproducibility of that tool package).

But I think compatible_packages can already handle that pretty well - it's OK for the list to have hypothetical options in the compatible_packages list that came out to "INVALID", you just skip to the next option and see if it works. And presumably a requirement whose validate() failed just makes that requirement's package_id "INVALID", and full_mode propagates this, but nothing fails the whole graph. So if any of the compatible_packages options delete that requirement (or reduce it to minor_mode, or whatever) that should allow the overall the package_id for this alternative to be valid again.

puetzk avatar Jun 17 '22 16:06 puetzk

In general, I fully support this proposal. It gives better control over package ID computation of dependencies.

However, it still doesn't address a usecase which I find more important (or did I overread something?): Why can't a package define it's own mode?

At the moment there is a global definition, and consumers can overwrite the global defaults. This proposal keeps it the same, just the granularity is more fine grained. Even with this approach, I will still have to modify each and every package which uses library X, if package X might make changes that requires a full_version_mode.

Why can't a package require "minimum" compatibility, for itself? E.g. Each library knows best what versioning scheme it uses, e.g. semver ,... So each package should say: I don't follow any versioning, please always use full version mode. Or: I use semver. Mayor mode is fine (though, can Conan model the case only to use newer artifacts on runtime).

The integrator can then globally specify if they use the packages default, or something "stricter".

Then, if package could specify it's minimum requirement type, there is not so much of a need for their consumers to specify it at all.

KerstinKeller avatar Jun 17 '22 17:06 KerstinKeller

@KerstinKeller

We already have a PR with a preliminary PoC for what you are commenting: https://github.com/conan-io/conan/pull/11441. It seems doable, but we think it is important to first outline the basic behaviors, defining in which situations we are in embed, and not-embed mode, etc. Because that implies in https://github.com/conan-io/conan/pull/11441, that there should be probably 2 or more definitions in the recipes for the consumers, at least the "embed" and the "not-embed" one. Because it doesn't matter if a package doesn't follow semver, the case when it is being compiled as a static library and consumed by another static library might be totally different to the one that such static library is being compiled by an application. If you want the two cases to be the same, it is easy, just assign both the same policy, but in general it seems that defining the policies for the different scenarios should be part of this feature.

memsharded avatar Jun 19 '22 20:06 memsharded

Sorry for the late reply.

The configuration for defaults is global, and managed in global.conf. The idea is that changing these defaults should be a team or company wide policy, and define it for most projects and teams.

The conans global config has always been frustrating to me. What if I am a part of several "teams or companies"? My team from my job, other teams from my job, several open source projects, my own setup, educational setups?

The idea of any global state influencing build result smells a lot. Is it totally necessary? Can we get rid of it?

mmatrosov avatar Jun 29 '22 18:06 mmatrosov

Hi @mmatrosov

The conans global config has always been frustrating to me. What if I am a part of several "teams or companies"? My team from my job, other teams from my job, several open source projects, my own setup, educational setups?

The idea of any global state influencing build result smells a lot. Is it totally necessary? Can we get rid of it?

How we can get rid of it, this is the reality we see, and you just described it. For example companies will use custom settings, and they will use custom settings in their recipes to affect their builds. They will have very different profiles, and most likely very different hooks. Surely completely different remote repositories. The packages they build for those recipes with those profiles will not work without that. And as you have suggested, same happens for other setups. It is not just the configuration, everything for a setup is connected.

The recommended way to deal with that is to use a different Conan home for each different setup. That home will contain the settings, the profiles, the remotes, the hooks, for that specific setup, and of course the configuration for that specific setup.

memsharded avatar Jun 29 '22 18:06 memsharded

The conans global config has always been frustrating to me. What if I am a part of several "teams or companies"? My team from my job, other teams from my job, several open source projects, my own setup, educational setups?

The idea of any global state influencing build result smells a lot. Is it totally necessary? Can we get rid of it?

Sorry for being OT, but it might be helpful in this context

I simply use different CONAN_USER_HOME for different team/project setups https://docs.conan.io/en/latest/reference/env_vars.html?highlight=conan_user_home#conan-user-home so there is no global state for me, since I never use ~/.conan

often automatically set via direnv , so by just changing into a directory I have the team/project environment, and only the download cache, which is also set that way, is shared

a4z avatar Jun 29 '22 19:06 a4z

Thanks for letting me know about conan home. I completely forgot about this feature. I have lived with ~/.conan my whole life and did not have much problems with it. But I have a feeling that situation might change with core.package_id:default_* modes.

Let us consider two global settings mentioned above: settings and remotes. We indeed have custom settings.yaml file, but we append to it. We do not remove stuff. Hence, every recipe from CCI behaves the save way as in CCI CI. We also indeed have our own remotes. But again, we do not remove conan-center, we simply add our artifactory host first. Furthermore, if you assume all remotes contain properly built binaries, it does not matter which remote to choose for the build from the result's perspective. Again, every recipe from CCI behaves the save way as in CCI CI.

But what about core.package_id:default_* modes? It seems like if you build the same dependency graph on two hosts with different default modes, then update some dependencies and built again, you can end up in completely different binaries. Is this correct or am I missing something?

mmatrosov avatar Jun 30 '22 11:06 mmatrosov

Late to the party on this, but an observation.

As much as I appreciate the correctness in identifying downstream packages as out-of-date when making a change to 'my package', to me this suggests that making small changes in common packages could cause a cascade of many package rebuilds in CI systems?

Of course, configuring this to suit your level of 'correctness' and 'comfort' is probably the solution to this.

I'm just wondering if any testing in CI on this proposal has revealed a significant cost overhead in the most correct mode? If I make a minor change, how much effort does it take to get an entire dependency tree back into sync by CI systems?

For example, the library we often speculate about here is zlib, and what needs changing if we need to patch it, as it's often at the very bottom of the pile of dependencies. Certainly, I see all of the downstream packages that link static-zlib into a shared library would be marked out-of-date, and then where that cascades from there.

Is this a reasonable concern?

foundry-markf avatar Jul 22 '22 12:07 foundry-markf

The feature https://github.com/conan-io/conan/pull/11441, that allows a dependency to define the package_id_embed_mode, package_id_non_embed_mode and package_id_unknown_mode for their dependencies has been merged and it will be available in Conan 2.0-beta.2

This definition will have priority over conf definition of default_package_id_mode, that as the name implies, it is only a "default", and only effective when no one else is defining that value.

memsharded avatar Jul 26 '22 12:07 memsharded

Regarding the definition of CONAN_USER_HOME (CONAN_HOME in 2.0), a new feature has been merged in Conan 2.0, the definition of a .conanrc file that defines such a variable, to be able to define the location of the Conan home, differently and auomatically for different projects.

memsharded avatar Jul 26 '22 12:07 memsharded

Regarding the definition of CONAN_USER_HOME (CONAN_HOME in 2.0), a new feature has been merged in Conan 2.0, the definition of a .conanrc file that defines such a variable, to be able to define the location of the Conan home, differently and auomatically for different projects.

awesome!

a4z avatar Jul 26 '22 12:07 a4z

For example, the library we often speculate about here is zlib, and what needs changing if we need to patch it, as it's often at the very bottom of the pile of dependencies. Certainly, I see all of the downstream packages that link static-zlib into a shared library would be marked out-of-date, and then where that cascades from there.

@foundry-markf In our experience, many companies and teams have been opting-in for the most strict modes, including package_revision_mode, requiring tons of re-builds (sometimes probably unnecessary). So apparently, there is a larger concern about correctness than about resources or speed.

The approach in this proposal is accordingly more on the correctness side, but not extremely, leaving space for user control about what/when needs to be rebuilt using versioning schemas. It is a matter of probabilities, if a shared library is linking a static library, the likelihood of embedding it is very high, recipe_revision_mode will be used, meaning that even a cosmetic change to a recipe like a comment or docstring will trigger the rebuild of the consumer shared libraries. This is up to the user, if they have the discipline and tooling to do the correct version bumps when something in code changes, then it can be relaxed to patch_mode for exampl, but as the proposed default, we chose something that is closer to what we see in practice by users.

For static libraries, we also need to find some balance that works as default for ConanCenter. With now +1300 packages, built for +100 configurations, it is not possible to rebuild everything everytime one library recipe changes one comment. So we went for something stricter than the current default, and went to minor_mode. This can work quite well in practice for environments like ConanCenter, in which third-party packages are being packages, as the source code of the package actually doesn't change when you create new revisions. If you apply this default for your own libraries, it can be done if the developers have enough discipline to bump the patch/minor versions correctly when necessary, but otherwise you might need to go stricter too.

memsharded avatar Jul 26 '22 12:07 memsharded

Closing as outdated, still this was super useful thanks again for all feedback

memsharded avatar Jul 11 '24 07:07 memsharded