govuk-developer-docs icon indicating copy to clipboard operation
govuk-developer-docs copied to clipboard

Fetch docs from remote repos via `git clone` instead of individually requesting every file from GitHub.

Open sengi opened this issue 1 year ago • 1 comments

GitHubRepoFetcher is super unkind to GitHub's HTTP API. We're basically crawling every remote repo that's listed in data/repos.yml and issuing an HTTP request for every file in its docs/ directory (including subdirs) on startup.

This makes for a miserable developer experience when trying to preview changes to documentation. Startup takes many minutes and often fails because we hit rate-limits (sometimes even when using an API token!) The worst part is that the tests take forever to run and depend on thousands of network requests all succeeding, to endpoints which are outside our control.

It'd be simpler, faster and more reliable just to clone the remote repos and read the .md docs from the local filesystem. We wouldn't even have to download the whole of each repo; it's possible to download just the docs directory for just the head of the default branch, by using clone --filter with sparse-checkout.

This would also let us ditch our homegrown cache mechanism, because the files will just stick around when developing locally and we can use the built-in cache in GitHub Actions.

sengi avatar Apr 13 '23 21:04 sengi