guides-cms icon indicating copy to clipboard operation
guides-cms copied to clipboard

Improve usage of conditional API requests

Open durden opened this issue 9 years ago • 1 comments

Currently we're only using conditional API calls to github for file listing. We should implement this in the remote.py layer for the other 'models' such as users and articles.

The idea is we send a conditional request call to github then refresh our cache of we get back the HTTP status code indicating nothing has changed.

durden avatar Feb 22 '16 10:02 durden

We're currently using timeout values for the cache instead of checking github for changes. This isn't ideal because we could serve stale data from the cache if it's updated on github.com directly. Places that are subjected to this are:

  • FAQ page
  • file listing pages (published.md, in_review.md, drafts.md)

The practical reason we're using cache timeouts instead of conditional requests is to save a round-trip HTTP request. The current flow to read a file's contents from the github API is like this:

  • Check cache (1 HTTP request to Redis instance)
  • Return if found
  • If not found, fetch from github API
  • Save contents with given cache timeout

So, we save a request to the github API to tell us if our cache data is fresh. The flow with conditional requests would probably work like this:

  • Check cache for etag or last-modified-date and file we're trying to read
  • Send conditional request to github API to see if data has changed
  • If contents have not changed, return the data we got from the cache in step 1
  • If data has changed, send regular HTTP request to github API
  • Cache contents and etag or last-modified-date

This approach would guarantee that we always have the most up-to-date data from the github API. However, it costs us an extra HTTP request. We would do a minimum of 2 HTTP requests (1 to get cache data and 1 to check for changes from github) instead of 1 (check cache for contents).

Thats the downside of this change. The benefits are:

  • Code doesn't have to deal with cache timeout values
  • Data we serve is guaranteed to reflect the real contents of github.com

The majority of the code shouldn't have to change except maybe removing a timeout parameter here and there. The added complexity of conditional requests could be hidden in the remote.py and file.py layers.

Finally, we will need to change the keys we cache dat with. We need to save the etag so we can make the conditional request with it.

Currently we cache files with a key of (filename, branch) and value as the file's contents. We could modify this scheme to cache with the same key but the value should be a tuple like (etag, contents). This way we do 1 request to the cache to get the contents and etag. This prevents us from having to make another request to the cache (and possibly a cache miss) to get the contents after we find out the data hasn't changed on the github.com side.

durden avatar Apr 07 '16 07:04 durden