yay icon indicating copy to clipboard operation
yay copied to clipboard

Add the ability to use aur mirror from github

Open Boria138 opened this issue 4 months ago • 32 comments

Is your feature request related to a problem? Please describe.

For the last week aur has been behaving very unstable and is unavailable all over the world any attempt to clone a package ends with an error.

Describe the solution you'd like

There is a mirror of aur on github

https://github.com/archlinux/aur

Where each branch leads to a package on aur, I would like yay to support this mirror as an alternative when aurweb is unstable.

Additional context

Just proxy or vpn won't help because aur is just lying around, not blocked

Boria138 avatar Aug 22 '25 06:08 Boria138

@archlinux can you add packages.gz to that github mirror using git-lfs ?

Dmole avatar Aug 24 '25 23:08 Dmole

@Boria138 FWIW, this is due to the DDoS attacks. (WayBack Machine mirror](https://web.archive.org/web/20250825030025/https://archlinux.org/news/recent-services-outages/))

@Dmole might not actually be necessary. lemme dig into the go-git stuff a bit, I've got an idea.

johnnybubonic avatar Aug 25 '25 20:08 johnnybubonic

kind of OT but I guess announcing Yay's future capability to use github as mirror would be news worthy on archlinux.org after completion, in case the DDoS continues to roam at completion.

GottZ avatar Aug 25 '25 20:08 GottZ

@johnnybubonic

true; somehow I forgot about

git ls-remote https://github.com/archlinux/aur | grep refs/heads/ | perl -pe 's/.*\///g' | grep -c .
140003

Dmole avatar Aug 25 '25 20:08 Dmole

having yay be able to fallback on the AUR mirror would be good for instances where the AUR is down

OzzyHelix avatar Aug 26 '25 02:08 OzzyHelix

I am wondering if having more mirrors for AUR software that yay and other aur helps could pull from would actually help in contingencies such as when the AUR is getting DDOS'd

OzzyHelix avatar Aug 26 '25 02:08 OzzyHelix

I am wondering if having more mirrors for AUR software that yay and other aur helps could pull from would actually help in contingencies such as when the AUR is getting DDOS'd

would be hilarious if they would manage to get github down with it.

a single point of failure is almost always problematic. I remember the discussions that came up before the archive was put on archive.org ... funny enough, after I asked if it's possible to host a mirror for it.

GottZ avatar Aug 26 '25 06:08 GottZ

sorry for the delay, all; $dayjob things came up. TL;DR: jump to the bottom of the post and expand.

so, here's a POC that:

  1. will clone the AUR github repo locally if it doesn't exist at the specified path /tmp/aurgit (takes about 5-20 minutes; took me 8 but i'm on a 1+ Gbps line, YMMV)
  2. can refresh it if it does (takes about 1-10 minutes; takes me about 4)
  3. iterates through all branches, extracts the PKGBUILD and .SRCINFO from each one, and write them out to disk at /tmp/aurgit.cache.d/<pkgbase> (takes about 22-24 seconds)

i'm not familiar enough with @Jguer or other contributor's established design practices/Yay internals for a PR (and i'm unfortunately too busy to really do any further work for this), so hopefully this gives a good head start, but i believe a good UX flow for this would be:

  1. yay --mirror-init or something to locally clone the AUR GH mirror (if it isn't already locally cloned) (otherwise automatically do so on first use, maybe after prompting since it takes so long) (maybe a fetch on it if it does exist)
  2. yay --mirror-sync or something to do a git fetch on that (cloning it if it doesn't exist)
  3. then maybe a --mirror flag to -S and -G operations that will use the local mirror (or give the option to fallback to it if a normal yay -S against the AUR fails for some reason, etc.)

there probably also needs to be some sort of configuration option to set the AUR git mirror URL (in the case of GH targeting) with a default of https://github.com/archlinux/aur. and a config option to set the local destination path of the repo checkout.

I also didn't write a .SRCINFO or PKGBUILD parser, as I assume there's probably something available already.

go-git is pure Golang so no dep on git(1) / /usr/bin/git or anything like that (though I'd imagine it's probably already installed if they're using the AUR!).

for those wondering:

Required free disk space
# clone of repo

$ du -sb /tmp/aurgit
1963899964      /tmp/aurgit

$ du -sh /tmp/aurgit
2.4G    /tmp/aurgit

$ du -sh --apparent /tmp/aurgit
1.9G    /tmp/aurgit


# only PKGBUILDs and .SRCINFOs

$ du -sb /tmp/aurgit.cache.d
306220494       /tmp/aurgit.cache.d

$ du -sh /tmp/aurgit.cache.d
1.2G    /tmp/aurgit.cache.d

$ du -sh --apparent /tmp/aurgit.cache.d
293M    /tmp/aurgit.cache.d

Anyways, here y'are.

POC for AUR GitHub mirror fetching, branch/pkgbase walking
package main

import (
	"fmt"
	"log"
	`os`
	`path/filepath`
	`sync`
	"time"

	`github.com/go-git/go-billy/v5`
	`github.com/go-git/go-billy/v5/osfs`
	`github.com/go-git/go-git/v5`
	`github.com/go-git/go-git/v5/plumbing`
	`github.com/go-git/go-git/v5/plumbing/cache`
	`github.com/go-git/go-git/v5/plumbing/object`
	`github.com/go-git/go-git/v5/plumbing/storer`
	`github.com/go-git/go-git/v5/storage/filesystem`
	`r00t2.io/sysutils/paths`
)

const (
	gitURL     string = "https://github.com/archlinux/aur"
	remoteName string = "aur"
	// Once you've cloned, if this is true it'll skip the fetch.
	skipUpdate bool = true
)

var (
	repo      *git.Repository
	repoLock  sync.Mutex
	cacheDst  string            = "/tmp/aurgit.cache.d"
	repoDst   string            = "/tmp/aurgit"
	cloneOpts *git.CloneOptions = &git.CloneOptions{
		URL:        gitURL,
		RemoteName: remoteName,
		Mirror:     true,
		NoCheckout: true,
		Progress:   os.Stdout,
	}
	// Git worktree
	wt billy.Filesystem
	// <wt>/.git
	gitCfgPath string
	gitCfg     billy.Filesystem
	// interface to gitCfg
	cfgStore *filesystem.Storage
)

type (
	AurPkgInfo struct {
		PkgBase  string
		PKGBUILD []byte
		SRCINFO  []byte
	}
	PkgErr struct {
		PkgBase string
		Err     error
		File    *string
	}
)

func (p *PkgErr) Error() (errStr string) {

	if p.File == nil {
		errStr = fmt.Sprintf("error for package base '%s': %s", p.PkgBase, p.Err.Error())
	} else {
		errStr = fmt.Sprintf("error for package base '%s' file '%s': %s", p.PkgBase, *p.File, p.Err.Error())
	}

	return
}

func fetchPkgbuildSrcinfo(ref *plumbing.Reference, wg *sync.WaitGroup, infoChan chan *AurPkgInfo, errChan chan error) {

	var err error
	var info AurPkgInfo
	var tree *object.Tree
	var file *object.File
	var dat string
	var commit *object.Commit

	defer wg.Done()

	if !ref.Name().IsBranch() ||
		ref.Name().Short() == plumbing.Main.Short() ||
		ref.Name().Short() == plumbing.Master.Short() {
		return
	}

	info.PkgBase = ref.Name().Short()

	repoLock.Lock()
	defer repoLock.Unlock()
	if commit, err = repo.CommitObject(ref.Hash()); err != nil {
		errChan <- &PkgErr{
			PkgBase: info.PkgBase,
			Err:     err,
		}
		return
	}
	// Recommend you check out the other fields in commit as well. Some useful metadata there.
	if tree, err = commit.Tree(); err != nil {
		errChan <- &PkgErr{
			PkgBase: info.PkgBase,
			Err:     err,
		}
		return
	}

	// You can get other files from the tree, too; not just the PKGBUILD and .SRCINFO.
	for _, fnm := range []string{
		"PKGBUILD",
		".SRCINFO",
	} {
		if file, err = tree.File(fnm); err != nil {
			errChan <- &PkgErr{
				PkgBase: info.PkgBase,
				Err:     err,
				File:    &fnm,
			}
			continue
		}
		if dat, err = file.Contents(); err != nil {
			errChan <- &PkgErr{
				PkgBase: info.PkgBase,
				Err:     err,
				File:    &fnm,
			}
			return
		}
		switch fnm {
		case "PKGBUILD":
			info.PKGBUILD = []byte(dat)
		case ".SRCINFO":
			info.SRCINFO = []byte(dat)
		}
	}

	infoChan <- &info

	return
}

func main() {

	var err error
	var exists bool
	var start time.Time
	var refIter storer.ReferenceIter
	var errChan chan error
	var wg sync.WaitGroup
	var doneChan chan bool
	var readerWg sync.WaitGroup
	var infoChan chan *AurPkgInfo
	// "Keyed" on pkgbase name. This has to (should be) a sync.Map because we're writing to it inside a range,
	// and that's a lot of locking/unlocking otherwise.
	var pkgBases sync.Map

	// This is disabled because it makes this POC run longer.
	// There's no real reason to do it unless you want a guaranteed clean starting point.
	/*
		log.Println("Clearing destination for testing")
		if err = os.RemoveAll(repoDst); err != nil {
			log.Panicln(err)
		}
	*/

	if err = paths.RealPath(&cacheDst); err != nil {
		log.Panicln(err)
	}

	start = time.Now()

	wt = osfs.New(repoDst, osfs.WithBoundOS())
	if gitCfg, err = wt.Chroot(git.GitDirName); err != nil {
		log.Panicln(err)
	}
	cfgStore = filesystem.NewStorage(gitCfg, cache.NewObjectLRUDefault())

	gitCfgPath = gitCfg.Root()

	if exists, err = paths.RealPathExists(&gitCfgPath); err != nil {
		log.Panicln(err)
	}

	if exists {
		if repo, err = git.Open(cfgStore, wt); err != nil {
			log.Panicln(err)
		}
		if !skipUpdate {
			log.Println("Fetching updates (this will take about 5-10 minutes)")
			if err = repo.Fetch(
				&git.FetchOptions{
					RemoteName: remoteName,
					Progress:   os.Stdout,
					Prune:      true,
				},
			); err != nil {
				log.Panicln(err)
			}
		}
	} else {
		log.Println("Initial clone (this will take about 5-20 minutes)")
		if err = cloneOpts.Validate(); err != nil {
			log.Panicln(err)
		}

		if repo, err = git.Clone(cfgStore, wt, cloneOpts); err != nil {
			log.Panicln(err)
		}
	}

	// Now iterate over each of the pkgbases (branches) and pull the PKGBUILD and SRCINFO from their latest commit.
	// This should be concurrency-safe.

	errChan = make(chan error)
	infoChan = make(chan *AurPkgInfo)
	doneChan = make(chan bool, 1)
	readerWg.Add(2) // One for the error reader, one for the info reader.

	// Read the errors as they come in and just write them to STDERR.
	go func() {
		var aurErr error

		defer readerWg.Done()

		for aurErr = range errChan {
			if aurErr != nil {
				fmt.Fprintln(os.Stderr, aurErr)
			}
		}
	}()

	// Read the AurPkgInfo as they come in and add them to pkgBases.
	go func() {
		var pkgbase *AurPkgInfo

		defer readerWg.Done()

		for pkgbase = range infoChan {
			if pkgbase != nil {
				pkgBases.Store(pkgbase.PkgBase, pkgbase)
			}
		}
	}()

	if refIter, err = repo.References(); err != nil {
		log.Panicln(err)
	}
	defer refIter.Close()
	if err = refIter.ForEach(
		func(ref *plumbing.Reference) (err error) {
			wg.Add(1)
			go fetchPkgbuildSrcinfo(ref, &wg, infoChan, errChan)
			return
		},
	); err != nil {
		log.Panicln(err)
	}

	go func() {
		wg.Wait()
		close(errChan)
		close(infoChan)
		readerWg.Wait()
		doneChan <- true
	}()

	<-doneChan

	// Now you can iterate over the files, or... parse them, or whatever.
	// Obviously you can do the same in fetchPkgbuildSrcinfo() instead,
	// and add the data (packages, deps, author, whatever) as fields directly in an AurPkgInfo.
	// I just dump it to disk here for an example.
	pkgBases.Range(
		func(k, v any) (ok bool) {

			var dpath string
			var fpath string
			var infoErr error
			var pkgBase string
			var pkgInfo *AurPkgInfo

			if v == nil {
				return
			}
			if pkgBase, ok = k.(string); !ok {
				return
			}
			if pkgInfo, ok = v.(*AurPkgInfo); !ok {
				return
			}

			dpath = filepath.Join(cacheDst, pkgBase)
			if infoErr = os.MkdirAll(dpath, 0o0755); infoErr != nil {
				log.Panicln(infoErr)
			}

			for fi, b := range [][]byte{
				pkgInfo.PKGBUILD,
				pkgInfo.SRCINFO,
			} {
				switch fi {
				case 0:
					fpath = filepath.Join(dpath, "PKGBUILD")
				case 1:
					fpath = filepath.Join(dpath, ".SRCINFO")
				}
				if err = os.WriteFile(fpath, b, 0644); err != nil {
					log.Panicln(err)
				}
			}

			return
		},
	)

	fmt.Printf("Ran for %s\n", time.Now().Sub(start))
}

johnnybubonic avatar Aug 26 '25 07:08 johnnybubonic

@johnnybubonic

Cloning the entire AUR mirror is wasteful.

Normally using the git binary, you would specify --single-branch to only fetch the data for the specific package you want to install, saving a lot of disk space and time.

git clone --branch yay --single-branch https://github.com/archlinux/aur.git yay

It also seems like yay currently defers to the git binary when interacting with repos.

FineWolf avatar Aug 26 '25 08:08 FineWolf

@johnnybubonic

Cloning the entire AUR mirror is wasteful.

Normally using the git binary, you would specify --single-branch to only fetch the data for the specific package you want to install, saving a lot of disk space and time.

The branches are only pkgbases, not packages.

You're either adding a local bare clone and doing a remote branch list , iterating that list, downloading the .SRCINFO for each branch and parsing it[0],

ORRR

you're keeping a local clone you operate on, like git is designed for, and searching that (possibly with pre-parsed cached .SRCINFO for pkgname => pkgbase map updated every sync/fetch)

every time you do a search for a package (i.e. pkgname), check for a newer version, etc.

Which is more wasteful?

git clone --branch yay --single-branch https://github.com/archlinux/aur.git yay

Again, this is only useful for installing or updating known pkgbases - not pkgnames, not searching, not for package metadata, not for unknown pkgbases, et. al.

It also seems like yay currently defers to the git binary when interacting with repos.

There's no reason for it to do so currently, clearly. go-git's reached significant useful maturity.

[0] Or using the GH API endpoints, either/or

johnnybubonic avatar Aug 26 '25 14:08 johnnybubonic

The branches are only pkgbases, not packages.

@johnnybubonic searching just the pkgbase names is probably a tradeoff most would prefer over requiring a 3 GB cache. Or if we can't get the attention of @archlinux someone could make a cache of the cache with packages.gz in git-lfs for a more traditional search behavior.

Dmole avatar Aug 26 '25 15:08 Dmole

The branches are only pkgbases, not packages.

@johnnybubonic searching just the pkgbase names is probably a tradeoff most would prefer over requiring a 3 GB cache.

2.4 < 3, not > 3. The "cache" is just a literal extract+dump of PKGBUILD and .SRCINFO, and not condensed down into any sort of optimized, parsed, or trimmed form for POC/examplenpurposes. Like the code says.

(Though it's actually closer to 1.829 GiB/1.964 GB if we're being pedantic; it's going to differ based on the blocksizes of the filesystem. Note the actual bytecount from du -sb and the du --apparent)

Or if we can't get the attention of @archlinux someone could make a cache of the cache with packages.gz in git-lfs for a more traditional search behavior.

Has anyone offered to do this/done it yet? Is there an approval/vetting process? Installing single packages from the AUR is one thing, giving a single individual trusted control over metadata of all AUR packages is a whole other can of worms.

Look, I know local clones aren't pretty. I get that. But it's more of a path than wishing and hoping someone else "just does something" so you can use that. You can criticize it and wait for someone else to implement your perfect solution, or you can take steps to actually solving it.

Additionally, local clones are resistant against attacks against/downtime of GitHub itself. Git was designed to be decentralized, and I just don't think 2 GB is that big a tradeoff in 2025 given that's probably smaller than most's browser caches.

johnnybubonic avatar Aug 26 '25 15:08 johnnybubonic

I just don't think 2 GB is that big a tradeoff in 2025 given that's probably smaller than most's browser caches.

Don't forget about people with a limited internet connection, for them 2GB is an unacceptable luxury, so it's not so much the size as the traffic.

Boria138 avatar Aug 26 '25 15:08 Boria138

I just don't think 2 GB is that big a tradeoff in 2025 given that's probably smaller than most's browser caches.

Don't forget about people with a limited internet connection, for them 2GB is an unacceptable luxury, so it's not so much the size as the traffic.

Which is exactly the purpose of a local clone and occasional sync, yes. You aren't downloading 2 GB every time you e.g. search for a package or whatnot. You do it once, with an occasional fetch when convenient.

You know, like how git works.

johnnybubonic avatar Aug 26 '25 16:08 johnnybubonic

A full clone could be made into yet another option, something like --pkgname-only for those who want to avoid the large local cache.

Aside: This reminds me of a long time git server issue; anyone wanting to host a large git repo needs a lot of RAM as a remote clone calls repack which needlessly caches all commits in RAM.

Dmole avatar Aug 26 '25 19:08 Dmole

A full clone could be made into yet another option, something like --pkgname-only for those who want to avoid the large local cache.

This is actually a good approach as it still makes available the flexibility that only a local checkout would offer, though --pkgbase-only would be more appropriate[0]; pkgname has a specific meaning in ALPM[1] (and is not able to be searched/fetched without a local checkout and parsing of .SRCINFO on its pkgbase's branch at the least). The AUR GitHub repo only has branches based on pkgbase (not pkgname).

Aside: This reminds me of a long time git server issue; anyone wanting to host a large git repo needs a lot of RAM as a remote clone calls repack which needlessly caches all commits in RAM.

The pack.threads and pack.windowMemory (and a slew of other pack.* directives) go a long way for this.

[0] https://wiki.archlinux.org/title/PKGBUILD#pkgbase [1] https://wiki.archlinux.org/title/PKGBUILD#pkgname

johnnybubonic avatar Aug 26 '25 21:08 johnnybubonic

In the short term: There's always hosting the packages.gz file temporarily somewhere here until the maintainers decide to push it to the repo or do anything else of it, like stick it on a temporary branch of the yay repo, till a patch for the official mirror to function with yay goes through? Not too sure.

Perhaps this is a sign we need to expand the other arch mirrors to include AUR, but God only knows how bloated that would get, even.

ninetailedtori avatar Aug 27 '25 22:08 ninetailedtori

Perhaps this is a sign we need to expand the other arch mirrors to include AUR, but God only knows how bloated that would get, even.

It's not terrible; the entire GH repo with a mirror checkout is about 2 GiB.

I hesitate to jinx it but it looks like the AUR is back currently. I don't know for how long; it may behoove us all to have some sort of backup plan in place.

johnnybubonic avatar Aug 29 '25 01:08 johnnybubonic

I agree, I don't think it should be too difficult and it would allow us several redundancies, as proven by our current situation, no matter how much you don't want to use AUR, there's often a package or several you end up needing that's in it, so it has become a lot more important, especially recently with a ton of the mainstream packages in AUR as well, due to the official repository having more explicit guidelines. I'm surprised cachyos-PKGBUILDS isn't standard aururl form either, but it's more like a selection of cherry-picked packages and not a full mirror of it anyway, so it doesn't really count, given the majority of packages are NOT on it. But if we had a list service on the main instance similar to regular mirrors, but for AUR mirrors, or simply add a flag to the existing mirror check endpoint to specify AUR, then we could move yay to use a similar algorithm as pacman does in terms of scouring mirrorlists, and then provide an AUR-reflector.service as well, but that likely would end up being up to the arch maintainers themselves. :]

How is update pushing done on the main repo though? Is it still that updates push to the main instance only then distribute to mirrors, or that you can push updates to any mirror and they all share data? Because if it's the latter we'd need a similar setup for AUR, but if it's the former, well, there's gonna be some issues with, if there's a DDOS on the main instance or any other issue, we might be at risk of secvuln exploits during downtime. Not certain if that's very likely though.

ninetailedtori avatar Aug 29 '25 18:08 ninetailedtori

i found comm -23 <(pacman -Qqm | sort) <(curl https://aur.archlinux.org/packages.gz | gzip -cd | sort) this command that get the packages.gz and diff it with your packages, as yay still don't work for me, i make a small bash file, running this command and for each line i run git clone --branch ITEM --single-branch https://github.com/archlinux/aur.git ITEM and then makepkg --install this works, but is there some of this we can use to make yay work now?? as https://aur.archlinux.org/rpc still timeout for me, and i think a lot of others??

seocamo avatar Sep 19 '25 16:09 seocamo

Did we ever design a backup system that would work in the meantime? Or if anyone has an existing AUR clone that would work in the meantime if we can't localhost a packages.gz somewhere on this repo while reading from the github branches? Unfortunately I'm absolutely dreadful at golang so I feel so out of my depth here 😢

ninetailedtori avatar Oct 05 '25 18:10 ninetailedtori

Why do we need packages.gz, again ? Can't we just query https://github.com/archlinux/aur/info/refs?service=git-upload-pack ? And it's just 10Mo. (The URL is part of git's protocol when cloning, it contains all refs, so all branches)

I would love to submit a PR using this

iTrooz avatar Oct 09 '25 16:10 iTrooz

Why do we need packages.gz, again ?

Subpackages and versions.

Can't we just query https://github.com/archlinux/aur/info/refs?service=git-upload-pack ? And it's just 10Mo. (The URL is part of git's protocol when cloning, it contains all refs, so all branches)

Branches are pkgbase name only.

johnnybubonic avatar Oct 10 '25 01:10 johnnybubonic

What if packages.gz was stored in a separate supporting repository, where the latest commit would have that file? or even used "Releases" to store the file? This can be done easily using GitHub Actions.

hadi77ir avatar Oct 27 '25 23:10 hadi77ir

What if packages.gz was stored in a separate supporting repository, where the latest commit would have that file? or even used "Releases" to store the file? This can be done easily using GitHub Actions.

That's something you'd have to take up with the maintainers of https://github.com/archlinux/aur , not this repo/issue (yay), unless you're proposing yay maintain this themselves.

Which, again, requires a checkout of aur.git, switching to each branch, and parsing the PKGBUILD within each branch every time you wish to update the state of version/package info.

johnnybubonic avatar Oct 28 '25 02:10 johnnybubonic

@johnnybubonic

packages.gz does not contain "Subpackages and versions" any more than the github mirror heads, also it looks like the github mirror is not removing old packages / branches;

diff -y \
<(git ls-remote https://github.com/archlinux/aur | grep refs/heads/ | perl -pe 's/.*\///g' | tail -n 20) \
<(curl -s "https://aur.archlinux.org/packages.gz" | gzip -d | tail -n 15)

zypak								zypak
zyplayer-appimage				  |	zyfun-appimage
zyplayer-bin				      <
zyplayer-git				      <
zypper						      <
zypper-dup					      <
zypper-git							zypper-git
zyre-git							zyre-git
zytrax-git							zytrax-git
zyzzyva-git							zyzzyva-git
zz									zz
zz-git								zz-git
zzuf						      <
zzuf-git							zzuf-git
zzz									zzz
zzz-mod-manager-git					zzz-mod-manager-git
zzzfm-bin							zzzfm-bin
zzzfm-common-bin					zzzfm-common-bin
zzzfm-dpup							zzzfm-dpup
zzzfm-git							zzzfm-git

Dmole avatar Oct 28 '25 13:10 Dmole

@johnnybubonic

packages.gz does not contain "Subpackages...

Incorrect. e.g.:

$ curl -sL https://aur.archlinux.org/packages.gz | zgrep -E '^ceph-mds$'
ceph-mds

Note the package base: https://aur.archlinux.org/packages/ceph-mds https://aur.archlinux.org/cgit/aur.git/tree/PKGBUILD?h=ceph#n14

Note the lack of a ceph-mds branch: https://github.com/archlinux/aur/tree/ceph-mds

Note also: https://lists.archlinux.org/archives/list/[email protected]/thread/D4YC6Y7L4T5VSEONUCLHOX2R4NJKNIDP/

Which states:

# ...
- packages.gz
    - Listing of all packages separated by line break.
- pkgbase.gz
    - Listing of all package bases separated by line break.
# ...

and versions"

Ah, I was thinking of packages-meta-v1.json.gz/packages-meta-ext-v1.json.gz (which DO contain versions, and the latter contains eg..g licensing info and keywords. Things that would be needed for yay search functionality/feature parity if the AUR API is down).

any more than the github mirror heads,

False, as shown above.

also it looks like the github mirror is not removing old packages / branches;

Nope. Those present in the right column and not the left of the diff are subpackages, not old packages/branches, as stated/shown above. Those present in the left column of your diff and not the right are either broken/incomplete (e.g. no PKGBUILD or .SRCINFO) AUR "packages" (unsure if it's been fixed since, but it was possible to commit an incomplete package to the AUR at one point in the past) OR packages that have been removed from AUR. The latter seems like a bug that needs to be filed with upstream as they either aren't pruning during mirror pulls/fetches/merges or the true AUR git has some external mechanism "virtually" deleting packages that isn't present on the mirroring mechanism.

johnnybubonic avatar Oct 28 '25 13:10 johnnybubonic

Filed above-mentioned issue re: soft-deleted packages: https://gitlab.archlinux.org/archlinux/aurweb/-/issues/543

And while filing it, I noticed someone already filed a feature request for them providing/committing a copy of packages-meta-ext-v1.json.gz to the GitHub mirror as well: https://gitlab.archlinux.org/archlinux/aurweb/-/issues/539

johnnybubonic avatar Oct 28 '25 14:10 johnnybubonic

/packages-meta-ext-v1.json is 404 but /packages-meta-v1.json.gz (8.4M) looks like the correct thing to mirror (if doing that vs getting it from each PKGBUILD).

Dmole avatar Oct 28 '25 14:10 Dmole

/packages-meta-ext-v1.json is 404 but /packages-meta-v1.json.gz (8.4M) looks like the correct thing to mirror (if doing that vs getting it from each PKGBUILD).

Correction, .json.gz:

$ curl -sIL https://aur.archlinux.org/packages-meta-ext-v1.json.gz
HTTP/2 200
server: nginx
date: Tue, 28 Oct 2025 14:43:03 GMT
content-type: application/gzip
content-length: 11865816
last-modified: Tue, 28 Oct 2025 14:41:17 GMT
etag: "6900d60d-b50ed8"
expires: Tue, 28 Oct 2025 14:48:03 GMT
cache-control: max-age=300
strict-transport-security: max-age=31536000; includeSubdomains; preload
alt-svc: h3=":443"; ma=3600
content-encoding: gzip
accept-ranges: bytes

~~Will fix~~ Have fixed above post

johnnybubonic avatar Oct 28 '25 14:10 johnnybubonic