quetz icon indicating copy to clipboard operation
quetz copied to clipboard

Changelog / as-of request

Open SylvainCorlay opened this issue 4 years ago • 7 comments

Having a changelog for conda channels may have many benefits

  • enable simpler (full) mirroring
  • allowing to make requests for packages "as of" a certain dates when only certain package versions were available for better reproducibility. This may be useful for usecases such as binder (cc @minrk).
  • channels that don't have a changelog could simply ignore the command line argument.

SylvainCorlay avatar Sep 07 '20 10:09 SylvainCorlay

pypi-timemachine prompted me to think about this kind of thing on repo2docker, and it would be super cool to have something similar for conda, especially if it were an officially supported conda feature and not need a wrapper like pypi-timemachine. For our cases, I don't super care about the changelog part, only the "try to run as if it were date X" part. However you achieve that would be great!

It appears that packages have upload timestamps, so adding a condition upload_time <= date_cutoff in the solver seems like it would work for our purposes. A full changelog would enable things like changing labels over time, but that sounds hard!

minrk avatar Sep 07 '20 11:09 minrk

right, in the repodata on conda you do have the timestamp field which would allow you to do this kind of thing today. The timestamp is the upload time though ... but I don't know if there could be a way to add the real version release date as well.

wolfv avatar Sep 07 '20 11:09 wolfv

upload time is what's really relevant for simulating a past install, though, so I think that's fine.

minrk avatar Sep 07 '20 11:09 minrk

a condition in the solver

that is a neat idea.

SylvainCorlay avatar Sep 07 '20 13:09 SylvainCorlay

Being able to do an install as-of a specified timestamp is an important aspect of reproducibility.

xref: https://github.com/mamba-org/conda-specs/issues/3

I thought the timestamp in the repodata.json was the build timestamp, not the upload timestamp?

I really think storing package specs in a proper database rather than a json file is the way to go - it makes things like filtering the universe of packages on the upload_timestamp much easier. It would also make it dead easy to support retrospective changes in a much more robust way than the current repodata patch mechanism.

dhirschfeld avatar Sep 07 '20 23:09 dhirschfeld

indeed, the timestamps in package's info/index.json and repodata.json are identical, so the timestamp must be the build time not upload time. alas, no data on upload time is present in the repodata.json

we may add the upload time query to quetz api, which would make the as-of/changelog feature relatively easy to implement, but this would break compatibility with anaconda server. Should we go for it?

btel avatar Sep 23 '20 09:09 btel

Should we go for it?

+:100: from me!

I don't think it breaks compatibility - certainly no more than does conda-metachannel. The terminology they use there is "pruning the graph"; in this case all packages published (to the server) after the requested as_of_timestamp can be removed from the graph presented to the solver.

dhirschfeld avatar Sep 23 '20 11:09 dhirschfeld