Options to cache HTTP responses
Hi guzba!
I've been using curly on a website where caching the response is important. I'm curious to know how you would solve this problem in curly
What I have now is this (going off of memory so may not compile)
type ApiClient = object
curl: Curly
proc listResource(client: ApiClient, key: string): seq[Resource] =
client.curl.get(URL, ...).fromJson(seq[Resource])
# later, in my application
if cache.has(key):
return cache.get(key).fromJson(seq[Resource])
else:
let resources = client.listResource(key)
cache.set(key, resources.toJson())
This has a number of issues:
- I'm serializing the Nim type into my cache, rather than the raw HTTP response
- Very verbose
- I need to manage the cache based on a "key", which really corresponds to a specific GET request URL + parameters.
The other approach I've toyed around with looks like this:
type CachedCurl = object
curl: Curly
proc get(client: CachedCurl, url: string): Response =
if cache.has(url): return cache.get(url)
else: result = client.curl.get(url)
cache.set(url, result)
and then I use this object every time I write an API client. The main downside of this approach is that it is difficult to add caching after the fact. If someone else has written a client library to talk to an API that is using curly under the hood, I'll need to fork it to have it use my "CachedCurl" type instead.
One way around all of these solutions would be to introduce some sort of hooks API (like requests) to manipulate requests/responses more dynamically. That's a lot of additional complexity though.
Let me know what you think!
The first thing the caching makes me think of is validity and using headers like ETag and handling a 304 response code, or respecting other expires headers. Is the HTTP layer of caching relevant to what you're working on or are you not worried about using stale responses etc?
Good question!
My use case was specifically to reduce latency (the API I call to populate this data is pretty slow), and to reduce the number of calls I make (quota and rate-limiting). I'm not worried about stale responses.
I'm also less concerned with HTTP headers around expiring, though that may be a neat optimization.
I think ignoring HTTP related caching stuff is totally reasonable. I think the next question is though why are you even requesting the resource more than once and need a layer to cache it? Perhaps you could have a step that gets all the necessary info, then once it has it all, proceeds to the digesting / processing step. In this case caching isn't part of the process at all, there would instead be a request-phase then a use-phase. Possible?
That is a different option I considered (and actually used for a static data source) but this data still needs to be updated every so often. My current implementation expires the cache after an hour.
The main issues are that I have a third party API that is slow and that I want to rate limit my calls to. My application is a web service that fetches data across multiple sources and aggregates it into a single view. I'd like to cache the data fetching part.
Does that help clarify at all?
On Sat, Apr 26, 2025, 3:19 PM guzba @.***> wrote:
guzba left a comment (guzba/curly#23) https://github.com/guzba/curly/issues/23#issuecomment-2832658885
I think ignoring HTTP related caching stuff is totally reasonable. I think the next question is though why are you even requesting the resource more than once and need a layer to cache it? Perhaps you could have a step that gets all the necessary info, then once it has it all, proceeds to the digesting / processing step. In this case caching isn't part of the process at all, its of a request-phase then a use-phase. Possible?
— Reply to this email directly, view it on GitHub https://github.com/guzba/curly/issues/23#issuecomment-2832658885, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABV64UWI4FTHSSSV5HTFII323QA5XAVCNFSM6AAAAAB35APMRSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDQMZSGY2TQOBYGU . You are receiving this because you authored the thread.Message ID: @.***>
In this case I might consider an entirely different approach. Perhaps you could have the web service read-only from a data structure and have a thread running who's sole job it is, is to update the data structure that the web service is reading from.
In this way, the web service just reads whatever is in the data structure. It has no thoughts at all about HTTP requests or caching or whatever, it takes what it gets (probably lock + read).
The updater thread would maybe wake up every minute or so, and see which set of things need to be updated (or how many it can update this interval) and do that work (make requests, then lock + write to data structure).
Having HTTP requests directly trigger RPC requests sometimes is a can be sub-optimal since you can't precisely control the behavior whereas you can if you separate reading from updating.
Even a further alternative would be inserting Redis or another KV store in-between. The web server would always read from Redis, and the update service would keep the Redis data updated. If the updater failed for a bit, it does not take down the web service. If you want to tweak the update frequency, you just work on the updater. It also solves the cold-start problem (zero data at initial start-up) and the many-web-servers problem (duplicated requests).
Relevant?
Thanks for the well reasoned response! I also considered something very similar (using https://github.com/soasme/nim-schedules to have a worker thread update in the background). There were a couple reasons why I decided to abandon that approach:
- Ensuring that unnecessary requests aren't done. The advantage of lazily fetching and computing this data is that it only happens if there's an incoming request. While this leads to variable response times (cache miss), no work is "wasted". With a background thread, I'd need to store additional state (last_updated or last_requested) if I wanted to avoid doing requests when I have no users using the website.
- Making sure I have data on startup (cold start problem). Your solution does solve that by running the updater (thread or separate process) before the web server takes requests, but I don't think it scales well when the requests rely on some sort of user input. Say I'm building something like Google Flights - it makes sense to fetch flight information dynamically and cache it for specific routes, rather than compute all possible routes in an updater and add the data in the background, right? Naively, it seems like the updater would be doing lots and lots of requests to make sure we have EVERYTHING at start time.
- Versioning. If my cache is storing raw HTTP responses from an API this is less of an issue, but if I'm serializing Nim objects and need to add a field or something similar, I need to blow away the cache, restart my updater, and wait for it to be done before starting my web service back up. Alternatively, I write a converter that is able to handle both formats. Both options feels a bit annoying.
These are real requirements of my application unfortunately, not "what-ifs". Happy to expand on this more, especially if you think I'm running into the https://mywiki.wooledge.org/XyProblem.
If lazy fetching is required or valuable, then yeah my previous suggestion isn't a good fit.
In this case, it doesn't seem like a bad idea to have something like a CachedCurl above that handles making a request or using a cached response if it is available. It could also be presented more as a "give me this data as DataType proc" that may or may not result in an HTTP request, making it less focused on the Curl/HTTP part of things.(EG type FlightBoss with proc getFlights(to, from): seq[Flights] which may make request or be cached or who knows what).
You say
The main downside of this approach is that it is difficult to add caching after the fact.
I'm not sure what you mean given the approach is designed to be enabling caching.
As for
If someone else has written a client library to talk to an API that is using curly under the hood, I'll need to fork it to have it use my "CachedCurl" type instead.
We are starting to add layers of requirements on. Is this even part of the requirements / goal? Are you writing a service or an open source library?
Also why would you need to fork Curly? Your CachedCurl could just use Curly internally for requesting but present it's own API that your service uses, making Curly a simple dependency. I do not see how forking Curly becomes a requirement.
Ah, not fork curly but fork the wrapper client I'm using.
What I had in my mind was adding caching to something like https://github.com/treeform/digitalocean - obviously if I'm using a library without caching support, I need to either add caching support by wrapping it in my codebase, or fork it to have caching support. I don't think that's a problem that's intrinsic to curly at all, so please ignore it! The DigitalOcean wrapper doesn't even use curly, so that requirement doesn't make sense.
It sounds like you'd recommend wrapping the curl object in a new one that adds on caching, rather than building it into curly through hooks API - I'm more than happy with that solution, just wanted to make sure that there wasn't some obviously better way of doing it. Feel free to close the issue unless you have anything else you want to add!
Again, thank you for this library!
Ah, not fork curly but fork the wrapper client I'm using.
Gotcha yeah its always hard over a few quick messages to really understand when something does turn out to be a bit complex. The example makes it very clear what the issues are.
I do agree with this, at least presently:
It sounds like you'd recommend wrapping the curl object in a new one that adds on caching, rather than building it into curly through hooks API
I would consider just using the wrapper as a place to learn from and build just what I need since it seems what you want out of the project are specific and outside of what the wrapper was originally written to provide. I don't think its unreasonable to just write your own thing (or do a fork if that feels easier, though you may find yourself fighting it more than you think depending on how it works).
Never came back to this issue - as the API for curly is pretty small, I just wrapped each method I used with a new "CachedCurly" client.