Adopt the GitHub API for rendering README files
I wasn't aware that the GitHub README API could respond to different content types and using that API is a better way to fetch README content than we're currently using in #1090.
However, it's not as simple as switching. We're currently requesting page content on a per-request basis, but that doesn't work as the API call is rate limited. There are several options on how we could adopt this API, but the one that we have agreed is best is to cache README content as part of our ingestion process.
Steps to adopt this API:
- [ ] Fetch the README API using an
Acceptheader ofapplication/vnd.github.v3.html+jsonto get a rendered HTML version of the README file. - [ ] Cache that HTML content within our Infrastructure, either on Azure BLOB storage, or in another local cache, but not in the main database. If using BLOB storage, implement after #1094.
- [ ] Change fetchReadme to fetch from the cache rather than from the GitHub page directly.
On the subject of call-rate limiting:
You might be able to work around the limit using ETags, which allow you to poll for changes and only get "charged" if something has changed.
I use this in Action Status and have a couple of libraries that might conceivably help: Octoid and JSONSession.
They are probably not directly useful - Octoid in particular is very single-purpose - but you never know, there might be a nugget in either of them somewhere that's helpful in some way for this issue!
Off the top of my I can't recall whether the ETag mechanism even works with all of the API, or if it's just for some things. I am using it to monitor GH action workflow statuses, which is obviously something that's expected to change more frequently.