Add the ability to use aur mirror from github
Is your feature request related to a problem? Please describe.
For the last week aur has been behaving very unstable and is unavailable all over the world any attempt to clone a package ends with an error.
Describe the solution you'd like
There is a mirror of aur on github
https://github.com/archlinux/aur
Where each branch leads to a package on aur, I would like yay to support this mirror as an alternative when aurweb is unstable.
Additional context
Just proxy or vpn won't help because aur is just lying around, not blocked
@archlinux can you add packages.gz to that github mirror using git-lfs ?
@Boria138 FWIW, this is due to the DDoS attacks. (WayBack Machine mirror](https://web.archive.org/web/20250825030025/https://archlinux.org/news/recent-services-outages/))
@Dmole might not actually be necessary. lemme dig into the go-git stuff a bit, I've got an idea.
kind of OT but I guess announcing Yay's future capability to use github as mirror would be news worthy on archlinux.org after completion, in case the DDoS continues to roam at completion.
@johnnybubonic
true; somehow I forgot about
git ls-remote https://github.com/archlinux/aur | grep refs/heads/ | perl -pe 's/.*\///g' | grep -c .
140003
having yay be able to fallback on the AUR mirror would be good for instances where the AUR is down
I am wondering if having more mirrors for AUR software that yay and other aur helps could pull from would actually help in contingencies such as when the AUR is getting DDOS'd
I am wondering if having more mirrors for AUR software that yay and other aur helps could pull from would actually help in contingencies such as when the AUR is getting DDOS'd
would be hilarious if they would manage to get github down with it.
a single point of failure is almost always problematic. I remember the discussions that came up before the archive was put on archive.org ... funny enough, after I asked if it's possible to host a mirror for it.
sorry for the delay, all; $dayjob things came up. TL;DR: jump to the bottom of the post and expand.
so, here's a POC that:
- will clone the AUR github repo locally if it doesn't exist at the specified path
/tmp/aurgit(takes about 5-20 minutes; took me 8 but i'm on a 1+ Gbps line, YMMV) - can refresh it if it does (takes about 1-10 minutes; takes me about 4)
- iterates through all branches, extracts the
PKGBUILDand.SRCINFOfrom each one, and write them out to disk at/tmp/aurgit.cache.d/<pkgbase>(takes about 22-24 seconds)
i'm not familiar enough with @Jguer or other contributor's established design practices/Yay internals for a PR (and i'm unfortunately too busy to really do any further work for this), so hopefully this gives a good head start, but i believe a good UX flow for this would be:
yay --mirror-initor something to locally clone the AUR GH mirror (if it isn't already locally cloned) (otherwise automatically do so on first use, maybe after prompting since it takes so long) (maybe a fetch on it if it does exist)yay --mirror-syncor something to do a git fetch on that (cloning it if it doesn't exist)- then maybe a
--mirrorflag to -S and -G operations that will use the local mirror (or give the option to fallback to it if a normalyay -Sagainst the AUR fails for some reason, etc.)
there probably also needs to be some sort of configuration option to set the AUR git mirror URL (in the case of GH targeting) with a default of https://github.com/archlinux/aur. and a config option to set the local destination path of the repo checkout.
I also didn't write a .SRCINFO or PKGBUILD parser, as I assume there's probably something available already.
go-git is pure Golang so no dep on git(1) / /usr/bin/git or anything like that (though I'd imagine it's probably already installed if they're using the AUR!).
for those wondering:
Required free disk space
# clone of repo
$ du -sb /tmp/aurgit
1963899964 /tmp/aurgit
$ du -sh /tmp/aurgit
2.4G /tmp/aurgit
$ du -sh --apparent /tmp/aurgit
1.9G /tmp/aurgit
# only PKGBUILDs and .SRCINFOs
$ du -sb /tmp/aurgit.cache.d
306220494 /tmp/aurgit.cache.d
$ du -sh /tmp/aurgit.cache.d
1.2G /tmp/aurgit.cache.d
$ du -sh --apparent /tmp/aurgit.cache.d
293M /tmp/aurgit.cache.d
Anyways, here y'are.
POC for AUR GitHub mirror fetching, branch/pkgbase walking
package main
import (
"fmt"
"log"
`os`
`path/filepath`
`sync`
"time"
`github.com/go-git/go-billy/v5`
`github.com/go-git/go-billy/v5/osfs`
`github.com/go-git/go-git/v5`
`github.com/go-git/go-git/v5/plumbing`
`github.com/go-git/go-git/v5/plumbing/cache`
`github.com/go-git/go-git/v5/plumbing/object`
`github.com/go-git/go-git/v5/plumbing/storer`
`github.com/go-git/go-git/v5/storage/filesystem`
`r00t2.io/sysutils/paths`
)
const (
gitURL string = "https://github.com/archlinux/aur"
remoteName string = "aur"
// Once you've cloned, if this is true it'll skip the fetch.
skipUpdate bool = true
)
var (
repo *git.Repository
repoLock sync.Mutex
cacheDst string = "/tmp/aurgit.cache.d"
repoDst string = "/tmp/aurgit"
cloneOpts *git.CloneOptions = &git.CloneOptions{
URL: gitURL,
RemoteName: remoteName,
Mirror: true,
NoCheckout: true,
Progress: os.Stdout,
}
// Git worktree
wt billy.Filesystem
// <wt>/.git
gitCfgPath string
gitCfg billy.Filesystem
// interface to gitCfg
cfgStore *filesystem.Storage
)
type (
AurPkgInfo struct {
PkgBase string
PKGBUILD []byte
SRCINFO []byte
}
PkgErr struct {
PkgBase string
Err error
File *string
}
)
func (p *PkgErr) Error() (errStr string) {
if p.File == nil {
errStr = fmt.Sprintf("error for package base '%s': %s", p.PkgBase, p.Err.Error())
} else {
errStr = fmt.Sprintf("error for package base '%s' file '%s': %s", p.PkgBase, *p.File, p.Err.Error())
}
return
}
func fetchPkgbuildSrcinfo(ref *plumbing.Reference, wg *sync.WaitGroup, infoChan chan *AurPkgInfo, errChan chan error) {
var err error
var info AurPkgInfo
var tree *object.Tree
var file *object.File
var dat string
var commit *object.Commit
defer wg.Done()
if !ref.Name().IsBranch() ||
ref.Name().Short() == plumbing.Main.Short() ||
ref.Name().Short() == plumbing.Master.Short() {
return
}
info.PkgBase = ref.Name().Short()
repoLock.Lock()
defer repoLock.Unlock()
if commit, err = repo.CommitObject(ref.Hash()); err != nil {
errChan <- &PkgErr{
PkgBase: info.PkgBase,
Err: err,
}
return
}
// Recommend you check out the other fields in commit as well. Some useful metadata there.
if tree, err = commit.Tree(); err != nil {
errChan <- &PkgErr{
PkgBase: info.PkgBase,
Err: err,
}
return
}
// You can get other files from the tree, too; not just the PKGBUILD and .SRCINFO.
for _, fnm := range []string{
"PKGBUILD",
".SRCINFO",
} {
if file, err = tree.File(fnm); err != nil {
errChan <- &PkgErr{
PkgBase: info.PkgBase,
Err: err,
File: &fnm,
}
continue
}
if dat, err = file.Contents(); err != nil {
errChan <- &PkgErr{
PkgBase: info.PkgBase,
Err: err,
File: &fnm,
}
return
}
switch fnm {
case "PKGBUILD":
info.PKGBUILD = []byte(dat)
case ".SRCINFO":
info.SRCINFO = []byte(dat)
}
}
infoChan <- &info
return
}
func main() {
var err error
var exists bool
var start time.Time
var refIter storer.ReferenceIter
var errChan chan error
var wg sync.WaitGroup
var doneChan chan bool
var readerWg sync.WaitGroup
var infoChan chan *AurPkgInfo
// "Keyed" on pkgbase name. This has to (should be) a sync.Map because we're writing to it inside a range,
// and that's a lot of locking/unlocking otherwise.
var pkgBases sync.Map
// This is disabled because it makes this POC run longer.
// There's no real reason to do it unless you want a guaranteed clean starting point.
/*
log.Println("Clearing destination for testing")
if err = os.RemoveAll(repoDst); err != nil {
log.Panicln(err)
}
*/
if err = paths.RealPath(&cacheDst); err != nil {
log.Panicln(err)
}
start = time.Now()
wt = osfs.New(repoDst, osfs.WithBoundOS())
if gitCfg, err = wt.Chroot(git.GitDirName); err != nil {
log.Panicln(err)
}
cfgStore = filesystem.NewStorage(gitCfg, cache.NewObjectLRUDefault())
gitCfgPath = gitCfg.Root()
if exists, err = paths.RealPathExists(&gitCfgPath); err != nil {
log.Panicln(err)
}
if exists {
if repo, err = git.Open(cfgStore, wt); err != nil {
log.Panicln(err)
}
if !skipUpdate {
log.Println("Fetching updates (this will take about 5-10 minutes)")
if err = repo.Fetch(
&git.FetchOptions{
RemoteName: remoteName,
Progress: os.Stdout,
Prune: true,
},
); err != nil {
log.Panicln(err)
}
}
} else {
log.Println("Initial clone (this will take about 5-20 minutes)")
if err = cloneOpts.Validate(); err != nil {
log.Panicln(err)
}
if repo, err = git.Clone(cfgStore, wt, cloneOpts); err != nil {
log.Panicln(err)
}
}
// Now iterate over each of the pkgbases (branches) and pull the PKGBUILD and SRCINFO from their latest commit.
// This should be concurrency-safe.
errChan = make(chan error)
infoChan = make(chan *AurPkgInfo)
doneChan = make(chan bool, 1)
readerWg.Add(2) // One for the error reader, one for the info reader.
// Read the errors as they come in and just write them to STDERR.
go func() {
var aurErr error
defer readerWg.Done()
for aurErr = range errChan {
if aurErr != nil {
fmt.Fprintln(os.Stderr, aurErr)
}
}
}()
// Read the AurPkgInfo as they come in and add them to pkgBases.
go func() {
var pkgbase *AurPkgInfo
defer readerWg.Done()
for pkgbase = range infoChan {
if pkgbase != nil {
pkgBases.Store(pkgbase.PkgBase, pkgbase)
}
}
}()
if refIter, err = repo.References(); err != nil {
log.Panicln(err)
}
defer refIter.Close()
if err = refIter.ForEach(
func(ref *plumbing.Reference) (err error) {
wg.Add(1)
go fetchPkgbuildSrcinfo(ref, &wg, infoChan, errChan)
return
},
); err != nil {
log.Panicln(err)
}
go func() {
wg.Wait()
close(errChan)
close(infoChan)
readerWg.Wait()
doneChan <- true
}()
<-doneChan
// Now you can iterate over the files, or... parse them, or whatever.
// Obviously you can do the same in fetchPkgbuildSrcinfo() instead,
// and add the data (packages, deps, author, whatever) as fields directly in an AurPkgInfo.
// I just dump it to disk here for an example.
pkgBases.Range(
func(k, v any) (ok bool) {
var dpath string
var fpath string
var infoErr error
var pkgBase string
var pkgInfo *AurPkgInfo
if v == nil {
return
}
if pkgBase, ok = k.(string); !ok {
return
}
if pkgInfo, ok = v.(*AurPkgInfo); !ok {
return
}
dpath = filepath.Join(cacheDst, pkgBase)
if infoErr = os.MkdirAll(dpath, 0o0755); infoErr != nil {
log.Panicln(infoErr)
}
for fi, b := range [][]byte{
pkgInfo.PKGBUILD,
pkgInfo.SRCINFO,
} {
switch fi {
case 0:
fpath = filepath.Join(dpath, "PKGBUILD")
case 1:
fpath = filepath.Join(dpath, ".SRCINFO")
}
if err = os.WriteFile(fpath, b, 0644); err != nil {
log.Panicln(err)
}
}
return
},
)
fmt.Printf("Ran for %s\n", time.Now().Sub(start))
}
@johnnybubonic
Cloning the entire AUR mirror is wasteful.
Normally using the git binary, you would specify --single-branch to only fetch the data for the specific package you want to install, saving a lot of disk space and time.
git clone --branch yay --single-branch https://github.com/archlinux/aur.git yay
It also seems like yay currently defers to the git binary when interacting with repos.
@johnnybubonic
Cloning the entire AUR mirror is wasteful.
Normally using the
gitbinary, you would specify--single-branchto only fetch the data for the specific package you want to install, saving a lot of disk space and time.
The branches are only pkgbases, not packages.
You're either adding a local bare clone and doing a remote branch list , iterating that list, downloading the .SRCINFO for each branch and parsing it[0],
ORRR
you're keeping a local clone you operate on, like git is designed for, and searching that (possibly with pre-parsed cached .SRCINFO for pkgname => pkgbase map updated every sync/fetch)
every time you do a search for a package (i.e. pkgname), check for a newer version, etc.
Which is more wasteful?
git clone --branch yay --single-branch https://github.com/archlinux/aur.git yay
Again, this is only useful for installing or updating known pkgbases - not pkgnames, not searching, not for package metadata, not for unknown pkgbases, et. al.
It also seems like
yaycurrently defers to thegitbinary when interacting with repos.
There's no reason for it to do so currently, clearly. go-git's reached significant useful maturity.
[0] Or using the GH API endpoints, either/or
The branches are only pkgbases, not packages.
@johnnybubonic searching just the pkgbase names is probably a tradeoff most would prefer over requiring a 3 GB cache. Or if we can't get the attention of @archlinux someone could make a cache of the cache with packages.gz in git-lfs for a more traditional search behavior.
The branches are only pkgbases, not packages.
@johnnybubonic searching just the pkgbase names is probably a tradeoff most would prefer over requiring a 3 GB cache.
2.4 < 3, not > 3. The "cache" is just a literal extract+dump of PKGBUILD and .SRCINFO, and not condensed down into any sort of optimized, parsed, or trimmed form for POC/examplenpurposes. Like the code says.
(Though it's actually closer to 1.829 GiB/1.964 GB if we're being pedantic; it's going to differ based on the blocksizes of the filesystem. Note the actual bytecount from du -sb and the du --apparent)
Or if we can't get the attention of @archlinux someone could make a cache of the cache with packages.gz in git-lfs for a more traditional search behavior.
Has anyone offered to do this/done it yet? Is there an approval/vetting process? Installing single packages from the AUR is one thing, giving a single individual trusted control over metadata of all AUR packages is a whole other can of worms.
Look, I know local clones aren't pretty. I get that. But it's more of a path than wishing and hoping someone else "just does something" so you can use that. You can criticize it and wait for someone else to implement your perfect solution, or you can take steps to actually solving it.
Additionally, local clones are resistant against attacks against/downtime of GitHub itself. Git was designed to be decentralized, and I just don't think 2 GB is that big a tradeoff in 2025 given that's probably smaller than most's browser caches.
I just don't think 2 GB is that big a tradeoff in 2025 given that's probably smaller than most's browser caches.
Don't forget about people with a limited internet connection, for them 2GB is an unacceptable luxury, so it's not so much the size as the traffic.
I just don't think 2 GB is that big a tradeoff in 2025 given that's probably smaller than most's browser caches.
Don't forget about people with a limited internet connection, for them 2GB is an unacceptable luxury, so it's not so much the size as the traffic.
Which is exactly the purpose of a local clone and occasional sync, yes. You aren't downloading 2 GB every time you e.g. search for a package or whatnot. You do it once, with an occasional fetch when convenient.
You know, like how git works.
A full clone could be made into yet another option, something like --pkgname-only for those who want to avoid the large local cache.
Aside: This reminds me of a long time git server issue; anyone wanting to host a large git repo needs a lot of RAM as a remote clone calls repack which needlessly caches all commits in RAM.
A full clone could be made into yet another option, something like --pkgname-only for those who want to avoid the large local cache.
This is actually a good approach as it still makes available the flexibility that only a local checkout would offer, though --pkgbase-only would be more appropriate[0]; pkgname has a specific meaning in ALPM[1] (and is not able to be searched/fetched without a local checkout and parsing of .SRCINFO on its pkgbase's branch at the least). The AUR GitHub repo only has branches based on pkgbase (not pkgname).
Aside: This reminds me of a long time git server issue; anyone wanting to host a large git repo needs a lot of RAM as a remote clone calls repack which needlessly caches all commits in RAM.
The pack.threads and pack.windowMemory (and a slew of other pack.* directives) go a long way for this.
[0] https://wiki.archlinux.org/title/PKGBUILD#pkgbase [1] https://wiki.archlinux.org/title/PKGBUILD#pkgname
In the short term: There's always hosting the packages.gz file temporarily somewhere here until the maintainers decide to push it to the repo or do anything else of it, like stick it on a temporary branch of the yay repo, till a patch for the official mirror to function with yay goes through? Not too sure.
Perhaps this is a sign we need to expand the other arch mirrors to include AUR, but God only knows how bloated that would get, even.
Perhaps this is a sign we need to expand the other arch mirrors to include AUR, but God only knows how bloated that would get, even.
It's not terrible; the entire GH repo with a mirror checkout is about 2 GiB.
I hesitate to jinx it but it looks like the AUR is back currently. I don't know for how long; it may behoove us all to have some sort of backup plan in place.
I agree, I don't think it should be too difficult and it would allow us several redundancies, as proven by our current situation, no matter how much you don't want to use AUR, there's often a package or several you end up needing that's in it, so it has become a lot more important, especially recently with a ton of the mainstream packages in AUR as well, due to the official repository having more explicit guidelines. I'm surprised cachyos-PKGBUILDS isn't standard aururl form either, but it's more like a selection of cherry-picked packages and not a full mirror of it anyway, so it doesn't really count, given the majority of packages are NOT on it. But if we had a list service on the main instance similar to regular mirrors, but for AUR mirrors, or simply add a flag to the existing mirror check endpoint to specify AUR, then we could move yay to use a similar algorithm as pacman does in terms of scouring mirrorlists, and then provide an AUR-reflector.service as well, but that likely would end up being up to the arch maintainers themselves. :]
How is update pushing done on the main repo though? Is it still that updates push to the main instance only then distribute to mirrors, or that you can push updates to any mirror and they all share data? Because if it's the latter we'd need a similar setup for AUR, but if it's the former, well, there's gonna be some issues with, if there's a DDOS on the main instance or any other issue, we might be at risk of secvuln exploits during downtime. Not certain if that's very likely though.
i found comm -23 <(pacman -Qqm | sort) <(curl https://aur.archlinux.org/packages.gz | gzip -cd | sort) this command that get the packages.gz and diff it with your packages, as yay still don't work for me, i make a small bash file, running this command and for each line i run git clone --branch ITEM --single-branch https://github.com/archlinux/aur.git ITEM and then makepkg --install this works, but is there some of this we can use to make yay work now?? as https://aur.archlinux.org/rpc still timeout for me, and i think a lot of others??
Did we ever design a backup system that would work in the meantime? Or if anyone has an existing AUR clone that would work in the meantime if we can't localhost a packages.gz somewhere on this repo while reading from the github branches? Unfortunately I'm absolutely dreadful at golang so I feel so out of my depth here 😢
Why do we need packages.gz, again ? Can't we just query https://github.com/archlinux/aur/info/refs?service=git-upload-pack ? And it's just 10Mo. (The URL is part of git's protocol when cloning, it contains all refs, so all branches)
I would love to submit a PR using this
Why do we need packages.gz, again ?
Subpackages and versions.
Can't we just query https://github.com/archlinux/aur/info/refs?service=git-upload-pack ? And it's just 10Mo. (The URL is part of git's protocol when cloning, it contains all refs, so all branches)
Branches are pkgbase name only.
What if packages.gz was stored in a separate supporting repository, where the latest commit would have that file? or even used "Releases" to store the file? This can be done easily using GitHub Actions.
What if packages.gz was stored in a separate supporting repository, where the latest commit would have that file? or even used "Releases" to store the file? This can be done easily using GitHub Actions.
That's something you'd have to take up with the maintainers of https://github.com/archlinux/aur , not this repo/issue (yay), unless you're proposing yay maintain this themselves.
Which, again, requires a checkout of aur.git, switching to each branch, and parsing the PKGBUILD within each branch every time you wish to update the state of version/package info.
@johnnybubonic
packages.gz does not contain "Subpackages and versions" any more than the github mirror heads, also it looks like the github mirror is not removing old packages / branches;
diff -y \
<(git ls-remote https://github.com/archlinux/aur | grep refs/heads/ | perl -pe 's/.*\///g' | tail -n 20) \
<(curl -s "https://aur.archlinux.org/packages.gz" | gzip -d | tail -n 15)
zypak zypak
zyplayer-appimage | zyfun-appimage
zyplayer-bin <
zyplayer-git <
zypper <
zypper-dup <
zypper-git zypper-git
zyre-git zyre-git
zytrax-git zytrax-git
zyzzyva-git zyzzyva-git
zz zz
zz-git zz-git
zzuf <
zzuf-git zzuf-git
zzz zzz
zzz-mod-manager-git zzz-mod-manager-git
zzzfm-bin zzzfm-bin
zzzfm-common-bin zzzfm-common-bin
zzzfm-dpup zzzfm-dpup
zzzfm-git zzzfm-git
packages.gz does not contain "Subpackages...
Incorrect. e.g.:
$ curl -sL https://aur.archlinux.org/packages.gz | zgrep -E '^ceph-mds$'
ceph-mds
Note the package base: https://aur.archlinux.org/packages/ceph-mds https://aur.archlinux.org/cgit/aur.git/tree/PKGBUILD?h=ceph#n14
Note the lack of a ceph-mds branch: https://github.com/archlinux/aur/tree/ceph-mds
Note also: https://lists.archlinux.org/archives/list/[email protected]/thread/D4YC6Y7L4T5VSEONUCLHOX2R4NJKNIDP/
Which states:
# ...
- packages.gz
- Listing of all packages separated by line break.
- pkgbase.gz
- Listing of all package bases separated by line break.
# ...
and versions"
Ah, I was thinking of packages-meta-v1.json.gz/packages-meta-ext-v1.json.gz (which DO contain versions, and the latter contains eg..g licensing info and keywords. Things that would be needed for yay search functionality/feature parity if the AUR API is down).
any more than the github mirror heads,
False, as shown above.
also it looks like the github mirror is not removing old packages / branches;
Nope. Those present in the right column and not the left of the diff are subpackages, not old packages/branches, as stated/shown above. Those present in the left column of your diff and not the right are either broken/incomplete (e.g. no PKGBUILD or .SRCINFO) AUR "packages" (unsure if it's been fixed since, but it was possible to commit an incomplete package to the AUR at one point in the past) OR packages that have been removed from AUR. The latter seems like a bug that needs to be filed with upstream as they either aren't pruning during mirror pulls/fetches/merges or the true AUR git has some external mechanism "virtually" deleting packages that isn't present on the mirroring mechanism.
Filed above-mentioned issue re: soft-deleted packages: https://gitlab.archlinux.org/archlinux/aurweb/-/issues/543
And while filing it, I noticed someone already filed a feature request for them providing/committing a copy of packages-meta-ext-v1.json.gz to the GitHub mirror as well: https://gitlab.archlinux.org/archlinux/aurweb/-/issues/539
/packages-meta-ext-v1.json is 404 but /packages-meta-v1.json.gz (8.4M) looks like the correct thing to mirror (if doing that vs getting it from each PKGBUILD).
/packages-meta-ext-v1.json is 404 but /packages-meta-v1.json.gz (8.4M) looks like the correct thing to mirror (if doing that vs getting it from each PKGBUILD).
Correction, .json.gz:
$ curl -sIL https://aur.archlinux.org/packages-meta-ext-v1.json.gz
HTTP/2 200
server: nginx
date: Tue, 28 Oct 2025 14:43:03 GMT
content-type: application/gzip
content-length: 11865816
last-modified: Tue, 28 Oct 2025 14:41:17 GMT
etag: "6900d60d-b50ed8"
expires: Tue, 28 Oct 2025 14:48:03 GMT
cache-control: max-age=300
strict-transport-security: max-age=31536000; includeSubdomains; preload
alt-svc: h3=":443"; ma=3600
content-encoding: gzip
accept-ranges: bytes
~~Will fix~~ Have fixed above post