Fetch files from GitHub using Contents API

Open chris48s opened this issue 8 months ago • 1 comments

Quite a lot of our badges involve fetching the contents of a file from GitHub.

When we do this, we usually do it using https://raw.githubusercontent.com/ instead of the GitHub contents API because this means we can get a file without using a rate limit point.

However

GitHub have recently imposed stricter API rate limits on anonymous access. Relevant reading:

https://github.blog/changelog/2025-05-08-updated-rate-limits-for-unauthenticated-requests/
https://github.com/orgs/community/discussions/159123#discussioncomment-13148279

Hilariously, GitHub are getting smashed by LLM scrapers. I wonder if anyone at GitHub has ever heard the phrase "hoisted by their own petard" 😂 I guess not.

In any case, we are starting to see 429s calling raw.githubusercontent.com. I am seeing some of these thrown from production, but not as many as I'd expect. I reckon on GitHub's side we might actually have a slightly higher rate limit but I don't expect anyone to confirm or deny that. It is definitely affecting us both in CI and running tests in local dev, making tests flaky.

Switching to using the GitHub contents API should fix the 429s at the expense of costing us some rate limit points. I'm not too worried about this. We have some tokens to play with. Some places we can just change our code internally. However fixing this across the board is going to be non-trivial as there are a lot of places where we accept a user-supplied URL because we've encouraged users to supply a raw.githubusercontent.com URL. I'm thinking of situations like https://shields.io/badges/oss-lifecycle To really put this to bed once and for all probably need to implement something at the HTTP client layer to internally call the GitHub contents API when we see a request to https://raw.githubusercontent.com We don't need to boil this whole ocean at once though

Sigh

May 16 '25 20:05 chris48s

I'd not heard that expression fwiw but I will be adopting it and using it regularly moving forward 😆

I'm trying to reconcile the volume of token pool exhaustion error notifications I've seen over the past few months with the notion of changes that would result in more token quota being used.

Any context I may not have around those two not needing to be considered in conjunction?

Jun 11 '25 20:06 calebcartwright