Delayed appearance of packages on the Archive
In the context of archlinux/infrastructure#531 we are currently thinking about whether packages appearing on the Archive with some delay could cause issues for the repro infrastructure 🤔
As far as we understand it right now the packages are exclusively synced from the archive: https://github.com/archlinux/archlinux-repro/blob/6e8cee92270c127a67ef27414116b6c45deffc76/repro.in#L24 https://github.com/archlinux/archlinux-repro/blob/6e8cee92270c127a67ef27414116b6c45deffc76/buildinfo#L72
In the new setup that could cause some issues if the rebuilder picks up the package faster than it is synced to the archive. So we wanted to ask about input to this, would it be good just to add a fallback to the T0 or geo mirrors? Or some kind of retry mechanism..? 🤔
I think we could query both URLs. We already have code for this as we need to check for xz and zst extensions to files.
So if devops needs to delay sync, we can implement this :) No worries.
I also think it'd be fine, the scheduler part of the repro infra has a retry mechanism, so if the first attempt fails it's going to try again after some time.
This was also common back when we had to wait until the PKGBUILD became available in the svntogit repo. It causes a "Packages which have become not reproducible" email notification though.
As announced on arch-dev-public the split was done on December 27 2024.
archivetools updates the packages directory once a day, which I think is a bit slow, so I added a service yesterday (see this draft MR), which synchronize newly archived files to the archive server once a hour (+RandomizedDelaySec=10m).
We may be able to improve this further if needed, but let's KISS for now.
I think that archlinux-repro (or rebuilderd?) does currently not handle this and packages are listed as unreproducible because their dependencies (or the specific version) did not yet appear in the Arch Linux Archive 🤔 Maybe we should first check a geo mirror?
One advantage of using the archive exclusively, is that we ensure the packages actually ended up in the archive. Besides that I don't really have any opinions on this.
Agree with @kpcyrd having a check that the archive is functional is nice. That we now have non-atomic archive is some bad fallout from the server split.
An alternative is:
Tue 2025-06-24 10:28:06 UTC 2min 32s Tue 2025-06-24 10:23:06 UTC 2min 27s ago [email protected] [email protected]>
Tue 2025-06-24 10:28:22 UTC 2min 47s Tue 2025-06-24 10:23:22 UTC 2min 12s ago [email protected] [email protected]
Tue 2025-06-24 10:28:22 UTC 2min 47s Tue 2025-06-24 10:23:22 UTC 2min 12s ago [email protected] [email protected]>
Tue 2025-06-24 10:28:22 UTC 2min 47s Tue 2025-06-24 10:23:22 UTC 2min 12s ago [email protected] [email protected]
These repository sync timers now run every 5 minutes so rsyncing every hour is bound to run into issues. We could change that to one hour? Of course issues still occur then, maybe we can rsync every 25 minutes and read the repository database every 1 hour?