python-build-standalone icon indicating copy to clipboard operation
python-build-standalone copied to clipboard

Metadata file about latest release is updated too early

Open bulletmark opened this issue 1 year ago • 18 comments

The file available at https://raw.githubusercontent.com/indygreg/python-build-standalone/latest-release/latest-release.json is supposed to point to the latest release but I notice that it currently points to 20240725 which is in prelease. It should point to 20240713 which is the latest current release. That file should be updated only after the release is formalized.

BTW, that pending release 20240725 only has -install_only_stripped builds, no -install_only builds are present.

bulletmark avatar Jul 25 '24 22:07 bulletmark

On checking, I find that the 20240725 builds are labelled as -install_only_stripped but are actually the unstripped binaries.

bulletmark avatar Jul 25 '24 23:07 bulletmark

I see that subsequent new release 20240726 fixes the observations mentioned above about the missing/wrong binaries in 20240725 - although that is tangential to the issue raised here.

bulletmark avatar Jul 26 '24 00:07 bulletmark

All of this is fixed though agree that file should only be updated when the release is promoted to stable. Kind of a chicken and egg problem though? The file would be temporarily stale if we only updated it after promoting.

charliermarsh avatar Jul 26 '24 00:07 charliermarsh

Why would it be stale? That independent file (which is not part of any release) points to the latest release and should be updated immediately after a new promotion is completed. Old releases last forever so are still valid.

When I raised this issue the file was pointing to 20240725 but Github was saying that release was "pre-release".

bulletmark avatar Jul 26 '24 00:07 bulletmark

It would be stale because it would be pointing to a release that isn't the latest release, despite being called latest-release.json :) It would still be pointing to a valid release, but it would be stale! It's clearly better than pointing to a pre-release, though.

charliermarsh avatar Jul 26 '24 01:07 charliermarsh

Again, at the present time, https://raw.githubusercontent.com/indygreg/python-build-standalone/latest-release/latest-release.json points to 20241205 release but the current release is 20241016 which breaks my update tool. I still don't understand how this can be a difficult problem? Immediately after the release is finally pushed, the automation should then update the metadata file.

bulletmark avatar Dec 05 '24 22:12 bulletmark

Hey! Sorry this broke your update tool. However, the following is not a respectful way to engage with us:

I still don't understand how this can be a difficult problem?

These releases are complex, as is the tooling around them. The scale of the releases means we encounter many instabilities in GitHub's infrastructure and there are frequently problems with the release artifacts that we have to resolve. Please be considerate.

I believe the release automation updates the latest-release.json after all the artifacts are published but before the release is marked as the latest. This gives us an opportunity to manually verify the release and add release notes. Certainly it'd be reasonable to adjust the automation around updating the latest-release.json file, but we have not prioritized it as it adds more steps to an already complex release process.

zanieb avatar Dec 05 '24 23:12 zanieb

If you're interested in improving it, I'm happy to discuss the best approach and review a pull request.

zanieb avatar Dec 05 '24 23:12 zanieb

It's also worth noting in this case, I have moved the release back to a pre-release to investigate a possible regression (ref #405). I'd recommend making your tool robust to incorrect latest-release.json tags if feasible.

zanieb avatar Dec 05 '24 23:12 zanieb

Ok, sorry about that. I know little about Github automation/actions so can't really contribute. From what you are saying it seems that metadata file would be not be part of the release?

Re your second comment above, my program assumes that file points to the latest release and note the contents do not say whether it is a pre-release. So are you saying I should not use that file at all?

bulletmark avatar Dec 05 '24 23:12 bulletmark

From what you are saying it seems that metadata file would be not be part of the release?

I think we'd need to run a second action after validating the release manually.

Re your second comment above, my program assumes that file points to the latest release and note the contents do not say whether it is a pre-release. So are you saying I should not use that file at all?

I think my recommendation is to check if the release in the file is a pre-release using GitHub's API. If your program is open source, I'm happy to take a look at alternatives.

zanieb avatar Dec 06 '24 00:12 zanieb

To be clear, that file can be fetched and checked very quickly so I can do that (cached for a short period of course) without using the Github API. I use the API for grabbing the actual artifacts but avoid the API for that one quick check to avoid rate-limiting so users don't have to configure an access token (although that is an option I provide).

bulletmark avatar Dec 06 '24 00:12 bulletmark

The only other option I can think of is to not use that file and instead scrape the first release page but that is ugly and slow. I assumed that indygreg set that file up for people to use it just like I am.

bulletmark avatar Dec 06 '24 00:12 bulletmark

No need to scrape, there are Atom (like RSS) feeds available for releases and tags for all GitHub projects:

  • https://github.com/indygreg/python-build-standalone/releases.atom
  • https://github.com/indygreg/python-build-standalone/tags.atom

I've used these sort of feeds for programmatically finding the latest version for a GitHub project.

Perhaps latest-release.json should be removed and these used instead?

hugovk avatar Dec 06 '24 14:12 hugovk

Thanks @hugovk

That certainly seems better than trying to duplicate the information — a single source of truth seems ideal.

zanieb avatar Dec 06 '24 14:12 zanieb

There's also https://api.github.com/repos/indygreg/python-build-standalone/releases which has "prerelease": true for prereleases. See https://github.com/astral-sh/uv/blob/3aaa9594be4727fb4a6260b1cc5782eb66e47284/crates/uv-python/fetch-download-metadata.py for a complex usage example.

konstin avatar Dec 06 '24 14:12 konstin

@hugovk thanks - at first sight those links look promising as a replacement for latest-release.json but upon inspection it appears they suffer from the same problem. I.e. each release is listed there without any identification to discriminate a pre-release from a release. @konstin, that github API link does identify a pre-release but is way too big, slow, and clunky for my intended use (just click on it and watch how long it takes to complete). The received JSON data is 31MB(!) but I only want to know the latest released tag YYYYMMDD.

bulletmark avatar Dec 07 '24 04:12 bulletmark

In case anybody else is interested, I found a solution for this. I simply fetch https://github.com/astral-sh/python-build-standalone/releases/latest which redirects to what Github mark as "latest" on the release page then catch the returned redirected URL and extract the tag. That is very lightweight (i.e. I don't actually follow the redirection), doesn't require using the API, and is exactly what I was looking for.

bulletmark avatar Dec 20 '24 03:12 bulletmark