conan icon indicating copy to clipboard operation
conan copied to clipboard

[feature] package revisions in lockfiles

Open smoofra opened this issue 1 year ago • 7 comments

What is your suggestion?

Hi, I would like to be able to record package revisions in lockfiles.

I want to be able to record exactly what binaries went into a build, including which package revision, so I can download the same exact set of packages later and reproduce the build. Even if someone has rebuilt some of the packages and upload new package revisions.

I'd suggest adding a flag to conan lock that would include package revisions in the lockfile.

Have you read the CONTRIBUTING guide?

  • [X] I've read the CONTRIBUTING guide

smoofra avatar Oct 08 '24 14:10 smoofra

Hi @smoofra

We had this functionality early on in Conan 2, but it was removed.

Having more than 1 package revision should be considered a process or model error. There should exist exactly 1 package revision for every recipe-revision + package_id. Please read the package-revisions section: https://docs.conan.io/2/tutorial/versioning/revisions.html#package-revisions

As such, I am afraid there are no plans to support this in any other Conan functionality like lockfiles, or effect in package_id at this moment.

memsharded avatar Oct 08 '24 20:10 memsharded

How is that supposed to be enforced?

For example, I have a collection of recipes that create compilers and other build-time tools. If I follow the recommended package id modes for build tools, then, for example updating the recipe for gcc will not cause the things that are built with gcc to be rebuilt. If I force it to rebuild, either because the gcc change has some important fix that I want to propagate, or just to test that everything is still working, then it results in multiple package revisions for the same package id.

It just seems like a important hole in conan's capabilities to be able to create these package revisions but to not have any way to lock things down so you know you're using the same ones tomorrow as you used today.

When you say there's no plans to include this feature, does that mean you wouldn't consider a pull request implementing it at all, or does it just mean you're not planning on implementing it?

I'd be happy to write the feature, as having it would be very useful to me.

thanks!

smoofra avatar Oct 09 '24 00:10 smoofra

For example, I have a collection of recipes that create compilers and other build-time tools. If I follow the recommended package id modes for build tools, then, for example updating the recipe for gcc will not cause the things that are built with gcc to be rebuilt. If I force it to rebuild, either because the gcc change has some important fix that I want to propagate, or just to test that everything is still working, then it results in multiple package revisions for the same package id.

This would be exactly what we called above a "model error". If there is something in that change of gcc there should be something in the model that accounts for it. It is likely that the recommended approach could be to use custom settings (https://docs.conan.io/2/reference/binary_model/extending.html#custom-settings). Like if something changes in the gcc package that will affect the consumers, it should be like a version change in the compiler.version, or it could be added a custom setting like compiler.version.update or compiler.update as msvc does to allow fine control over this.

The goal is that if something is changed that requires new binaries, it must result in new package_id, not package-revisions under the same package-id.

memsharded avatar Oct 09 '24 09:10 memsharded

OK, I understand that it's a model error, but that still leaves me with some problems I don't know how to solve..

Like, my recipes aren't perfect. Maybe they contain model errors in them. I understand that I should fix the model errors, but how do I enforce that the lockfile I create when I make a tag is going to download the same binaries tomorrow as it did today, even if someone does the wrong thing and uploads new package revisions. Can I put a policy on artifactory somehow that prevents them from doing that? Or, wouldn't it be even better if it were possible to record those package revisions that ought to be unique into the lockfile so I can check later and make sure they're unique?

Relatedly, I have a bunch of these recipes that just package upstream software for conan. I'm not sure how I should be modeling that. Right now they all just have names like gcc/9.1.0. But I wind up having to updating them fairly frequently for conan related reasons. For example I had some user conf variables with names like user.foo.bar. Turns out that was incorrect, the correct thing is user.foo:bar. A new version of conan comes out and starts enforcing that, so the recipes need to be updated. How should I be modeling that? I should really be putting something in custom settings to reflect that a bug was fixed in some recipe? I do understand custom settings, we use them already to model things like linux kernel version and glibc version for linux binaries. But that approach doesn't seem like a good fit for oops my recipe is broken, I need to fix it. If i was packaging my own software, I'd just increment the version number, but what do I do when there's a bug in, not gcc/9.1.0, but in my recipe packaging it?

smoofra avatar Oct 09 '24 14:10 smoofra

Or another scenario...

There's nothing stoping multiple people, or CI systems from simply building the same package on their own, and then uploading them. Now we have multiple package revisions without any model errors. But it's still possible that those binaries are different in some way that matters. Maybe due to a bug in a recipe, some information leaked from the build machine and influenced the resulting binaries. None of my recipes are bit-for-bit repeatable.

I want to make sure a release branch is built with the official binaries that were built in my CI, not some random package I might have that's sitting in my local cache, that may have been built by me, or that might have been downloaded from my beta repository.

smoofra avatar Oct 09 '24 14:10 smoofra

I understand that I should fix the model errors, but how do I enforce that the lockfile I create when I make a tag is going to download the same binaries tomorrow as it did today, even if someone does the wrong thing and uploads new package revisions. Can I put a policy on artifactory somehow that prevents them from doing that? Or, wouldn't it be even better if it were possible to record those package revisions that ought to be unique into the lockfile so I can check later and make sure they're unique?

There are some things that a tool can't absorb. We have started to document in https://docs.conan.io/2/knowledge/guidelines.html some common practices, for example, that developers shouldn't be allowed to upload directly to server repositories, and CI should be the responsible for this.

Artifactory itself doesn't have a direct built-in policy for this, but this is something that can be checked automatically and periodically relatively simple, with a conan list command that lists the existing package revisions for a given package. Or in practice using --build=missing should be enough to avoid it (you can apply to create with conan create . --build=missing:&.

Right now they all just have names like gcc/9.1.0. But I wind up having to updating them fairly frequently for conan related reasons. For example I had some user conf variables with names like user.foo.bar. Turns out that was incorrect, the correct thing is user.foo:bar. A new version of conan comes out and starts enforcing that, so the recipes need to be updated. How should I be modeling that?

Not really, the changes to sources and recipes do create new recipe-revisions, and all package-ids need to be built for that new recipe revision. Regarding the effect on the consumers, there are 2 different scenarios:

  • The fix in your recipe does not cause/require new binaries, then you can obviate changing anything.
  • If in practice it means that all packages that were built against it require a rebuild, then the answer is yes, model is needed for it. It might be some custom setting, because yes, now you are building with a new and different compiler than before, it would be the equivalent of using a new compiler version. There could be different approaches to this:
    • Reflect the change in a compiler.version= new version in settings.
    • Making tool_requires to affect the package_id of the consumers, which can be done with core.package_id:default_build_mode conf (this would be quite global)
    • Making tool_requires like gcc recipe to define a build_mode so if affects its consumers when applied, down to the recipe_revision or even package_id if you want to.

These approaches give a perfectly traceable, manageable and efficient (it knows what needs to be built, when and why) binary model, as opposed to just building multiple package revisions for the same thing, which is just "well, something changed, I have no idea what, but I have re-built a new binary even if it might not be necessary".

There's nothing stoping multiple people, or CI systems from simply building the same package on their own, and then uploading them. Now we have multiple package revisions without any model errors. But it's still possible that those binaries are different in some way that matters. Maybe due to a bug in a recipe, some information leaked from the build machine and influenced the resulting binaries. None of my recipes are bit-for-bit repeatable.

That is the thing, if by accident a package binary is re-built from source and re-uploaded with identical context and inputs, creating a new package-revision, even if it is not bit-by-bit identical and that is why it created a new package_revision, the binary is identical. It doesn't matter if it got a new package revision, it is exactly the same binary as it was before. So everything is good, and resolving to the latest one is perfectly ok, as it is identical to the others.

Note one of the largest issues why Conan 2 is not planning to model this is because if this is allowed, if package revisions are allowed that create un-modeled changes to binaries, then those package-revisions should also be factored-in again in the binary package_id computation, something that it is not happening at all in Conan 2. That required a 2-pass process in Conan 1 that was super confusing, problematic and fragile, disallowing some of the features that Conan 2 has been able to introduce like general compatibility.py plugin, or the massive simplification to lockfiles (which was the single feature that accumulated the most tickets in Conan 1, due to their complexity and problems derived from it).

The Conan 2 model, dropping the package_revisions from lockfiles and package_id was carefully thought and considered, and it is really being proved to be a very big step forward. Proper modeling of the changes that are traceable in package_id (at the end of the day, you can understand with a single conan list command why some new binaries are there, because it happens that something change in gcc package) and not multiple package-revisions is definitely the way to go. Yes, it requires a bit more of care and design of the changes done to tool_requires, in the same way that modifying a library already requires having proper modeling, versioning, etc (Conan 2 added modeling of this too, like traits and package_type) needs considering regular requires versioning and evolution, and package revisions were never a solution to un-modeled regular dependencies requires, Conan 2 has defined the same logic for tool_requires. But nothing really esoteric or weird, just managing the tool_requires with the same level of consideration as regular requires, and the effort is totally worth it from our experience and the many lessons learned supporting tons of users in this process.

memsharded avatar Oct 09 '24 16:10 memsharded

Everything you're saying makes sense to me, except the conclusion that this feature is not needed.

I'm certainly not asking you to go back to allowing package revisions to be hashed into package ids, or do anything that would complicate conan's graph model.

All I want is the ability to store package revisions in the lockfile and have Lockfile.resolve_prev use the recorded revisions.

This would solve problems (for me at least) that having a perfectly traceable, manageable and efficient binary model can't.

I want to have a perfectly traceable, manageable and efficient binary model, and I'm fixing any model errors in my recipes whenever I find them.

But I still want to be able to create reference builds that are as nearly perfectly repeatable as I can make them, even when the sources of non-repeatability are things that the model does not and can not account for.

I want to be able to lock down my releases so I can rebuild them in the future using the exact same tool binaries.

I want to be able to rebuild my toolchains recipes just because "something changed, I have no idea what, but I have re-built a new binary even if it might not be necessary", because doing so finds model errors that I can fix. When there is a bug, I want a way of telling conan to use the old binaries that work, not the new ones that don't.

the binary is identical. It doesn't matter if it got a new package revision, it is exactly the same binary as it was before. So everything is good, and resolving to the latest one is perfectly ok,

It's identical according to the model. Badly written recipes can introduce bugs into the model. And there are things that the model cannot account for. It can't account for unknown (and largely unknowable until you run into them) ways that the build environment can leak information into a package. This happens all the time with open source software that uses autoconf or similar tools, because probing the build machine for information is the default, and cross compilation is a distant afterthought that was bolted on later. It can't account for people making mistakes and rebuilding packages they shouldn't. It can't account for malicious packages. It can't account for compiler bugs.

smoofra avatar Oct 16 '24 19:10 smoofra

Hello together, hello @memsharded,

consistency is key for us.

Designing conan 2 in a way that it allows package revisions on the one hand-side when creating packages and uploading to Artifactory but not considering package revisions in commands and formats on the other hand-side is not consistent in my eyes and causing a lot problems.

I am really struggling with this topics since a while, because we need lockfiles to freeze packages for release in a biunique way and furthermore we need to promote exactly these packages from one to another Artifactory repository.

The decision in conan 2 to leave out the package ids and package revisions from lockfile, as well as introducing a second format, the package list (some commands can use lockfiles, some cannot, only package lists, like "conan download") makes it really hard (impossible without further scripting) to establish a consistent workflow.

It is not secure enough to ensure by other measures e.g. that a second package revision in Artifactory will never come to existence. Therefore a workflow of creating a lockfile only about the recipe and then somehow convert it into a package list (later, non atomic operation) by assuming the latest package revision existing now was the one which were frozen with the lockfile earlier is not sufficient.

Therefore I highly vote for:

  1. Please provide officially again in conan 2 (option for) a complete lockfile incl. package id & package revisions. E.g. make the parameter "--lockfile-packages" official again.
  2. Please provide a converter lockfile-2-package-list or ensure that each conan command (esp. download/upload) can use lockfiles directly.

or as an alternative: Provide a command which can create a lockfile and a package list in an atomic operation.

SzBosch avatar Dec 17 '24 13:12 SzBosch