quetz
quetz copied to clipboard
Changelog / as-of request
Having a changelog for conda channels may have many benefits
- enable simpler (full) mirroring
- allowing to make requests for packages "as of" a certain dates when only certain package versions were available for better reproducibility. This may be useful for usecases such as binder (cc @minrk).
- channels that don't have a changelog could simply ignore the command line argument.
pypi-timemachine prompted me to think about this kind of thing on repo2docker, and it would be super cool to have something similar for conda, especially if it were an officially supported conda feature and not need a wrapper like pypi-timemachine. For our cases, I don't super care about the changelog part, only the "try to run as if it were date X" part. However you achieve that would be great!
It appears that packages have upload timestamps, so adding a condition upload_time <= date_cutoff
in the solver seems like it would work for our purposes. A full changelog would enable things like changing labels over time, but that sounds hard!
right, in the repodata on conda you do have the timestamp field which would allow you to do this kind of thing today. The timestamp is the upload time though ... but I don't know if there could be a way to add the real version release date as well.
upload time is what's really relevant for simulating a past install, though, so I think that's fine.
a condition in the solver
that is a neat idea.
Being able to do an install as-of a specified timestamp is an important aspect of reproducibility.
xref: https://github.com/mamba-org/conda-specs/issues/3
I thought the timestamp in the repodata.json
was the build timestamp, not the upload timestamp?
I really think storing package specs in a proper database rather than a json file is the way to go - it makes things like filtering the universe of packages on the upload_timestamp
much easier. It would also make it dead easy to support retrospective changes in a much more robust way than the current repodata patch mechanism.
indeed, the timestamps in package's info/index.json
and repodata.json
are identical, so the timestamp must be the build time not upload time. alas, no data on upload time is present in the repodata.json
we may add the upload time query to quetz api, which would make the as-of/changelog feature relatively easy to implement, but this would break compatibility with anaconda server. Should we go for it?
Should we go for it?
+:100: from me!
I don't think it breaks compatibility - certainly no more than does conda-metachannel
. The terminology they use there is "pruning the graph"; in this case all packages published (to the server) after the requested as_of_timestamp
can be removed from the graph presented to the solver.