Add fallback to GitHub AUR mirror when aur.archlinux.org is down
Description:
Currently, paru relies only on aur.archlinux.org.
When aur.archlinux.org is down (e.g. high load or DDoS attack), paru cannot search or fetch packages.
The Arch Linux team provides an GitHub mirror of AUR packages:
git clone --branch <package_name> --single-branch https://github.com/archlinux/aur.git <package_name>
Question / Feature idea:
Would it be possible for paru to automatically fall back to the GitHub AUR mirror when aur.archlinux.org is unreachable?
Benefit:
This could improve the reliability of paru during AUR downtime.
Also consider a configuration option to always prefer the GitHub mirror.
The github mirror also includes packages which were deleted from AUR, including packages with ill-intended contents.
Searching or fetching from the github mirror is feasible, but doing so then removes the only small layer of defense the AUR has against malicious code - since many AUR users don't review PKGBUILDs or are not sufficiently able to.
The github mirror also includes packages which were deleted from AUR, including packages with ill-intended contents.
Searching or fetching from the github mirror is feasible, but doing so then removes the only small layer of defense the AUR has against malicious code - since many AUR users don't review PKGBUILDs or are not sufficiently able to.
Couldn't you just force the user to read the PKGBUILD then and also display a red capslock banner warning them of the potential consequences
You've always been able to clone deleted packages from the aur too, they're just not indexed. You would probably want to cache the last aur index instead of indexing git branches so it has good info to sort and filter with.
Couldn't you just force the user to read the PKGBUILD then and also display a red capslock banner warning them of the potential consequences
I would say to the first one, since people will complain about it. The second one was implemented in yaourt - the banner was also flashing. It's the one feature yaourt's clones (unfortunately!) never implemented.
You've always been able to clone deleted packages from the aur too, they're just not indexed
paru never supported retrieving deleted packages from the AUR. But yes, pkglist.gz could be used as a reference point.
https://github.com/ArcticLampyrid/aur-mirror-meta
I wrote a helper tool to allow indexing data from the AUR GitHub Mirror and to provide Paru-compatible endpoints. However, it may not be easy to use:
- There seems to be no fast way to build the index from GitHub. With my current approach, building the initial index takes over 3 hours.
- There is no way to distinguish between listed/unlisted AUR packages, which increases the number of retrieval results. Many of these are outdated or useless, and users now have to manually filter them out.
You could just cache the JSON data. It's not like the AUR RPC is down permanently.
https://aur.archlinux.org/packages-meta-ext-v1.json.gz
paru never supported retrieving deleted packages from the AUR.
I meant in general you could clone them from the AUR, I don't think paru should show you or let you install deleted packages without being explicit.
Caching the index is the best approach, writing extra code to index GitHub without being able to filter deleted packages would be an issue.
Unrelated:
PKGBUILD files can be distributed in upstream project git repositories just like nix files, in addition to the separate AUR repo, so a built-in way to clone and install would be nice.
For example: paru -S<flag> https://github.com/example-user/example-repo.git
An argument could be made that you can just git clone and makepkg, but that same argument can be used to invalidate using an AUR helper like paru in the first place, you can do that with the AUR. (Still display the PKGBUILD and a big warning banner)
You could just cache the JSON data. It's not like the AUR RPC is down permanently.
https://aur.archlinux.org/packages-meta-ext-v1.json.gz
Oh, maybe you are right. While I am developing this tool, the AUR RPC is down totally, that's why I did not consider it first.
https://stats.uptimerobot.com/vmM5ruWEAB/788139639 SLA is not very reliable
@dyxushuai what gives you the impression the AUR has an SLA?
@ArcticLampyrid
Oh, maybe you are right. While I am developing this tool, the AUR RPC is down totally, that's why I did not consider it first.
It does look like a better solution to the problem, more self-reliant.
Appreciate the effort.
if the speed could be increased significantly (e.g. crowd sourcing? built-in databases shipped with release?), it could become a very beneficial addition to paru as e.g. optional package dependency.
So guys is anyone working on this? I could try working on it, but can't promise anything.
Code to efficiently retrieve packages from the github mirror is available here: https://github.com/aurutils/aurutils/blob/master/lib/aurweb/aur-fetch--mirror
I think the best approach would be a tool that sets up a local RPC, that builds data on-the-fly from the github mirror. Then existing tools can query this local interface, instead of the remote one.
This does not address the issue of deleted packages being exposed. This information can only be cached when the AUR is online. Upstream could improve this by using git notes containing Maintainer information (None for deleted package, Orphaned for orphans, FooName for maintained packages.)
If you want to spend time on the issue, I suggest starting there.
Great to see your method @AladW . I haven’t tested it in detail yet. If we need to fetch metadata for all packages (i.e., every SRCINFO file across the ~141k branches), roughly how long would that take? I’m currently using the GraphGL endpoint to do this, and the initial sync takes a little over 3 hours. (of course, incremental sync is much faster)
You don't need to fetch metadata for all packages. You just fetch the branches for whatever packages are needed for the query. These will be automatically cached on later queries.
You just fetch the branches for whatever packages are needed for the query.
No, this is problematic. The branch name corresponds to pkgbase, not pkgname.
Around the time the github AUR mirror was created, a separate metadata archive was added. As it turned out the changes were too frequent for the git repository.
A simple pkgbase to pkgname mapping may change less frequently though - or helpers could just get used to querying by pkgbase. I would argue this is simpler anyway, since cloning and building packages is already done by pkgbase.
For search by e.g. description the full metadata is also required. I don't think 3 hours is needed though - --depth=1 and similar flags should help with the initial git clone. Some reference for which pkgname/pkgbase are deleted is still needed (the aforementioned git notes)
PS. This discussion would make more sense upstream. None of the paru developers work on AUR, or vice-versa.
maybe this could be done for packages that you already have installed from the aur and currently doing an update or something?
since last night i have been trying to connect to the aur, it's either really slow connection, or just outright stops, atm i'm manually updating the packages using makepkg and the github's mirror