async-http-client Caching support

It would be totally bad-ass if AHC would support response cache, meaning that AHC would not fire up actual HTTP request if valid response would be already in the cache.

I think that it would be great to add such infrastructure to AHC code:

AHC Cache interface with only no-op implementation
Interface and default implementation that would evaluate whether response is cacheable
Interface and default implementation that would map Request to cache key

Later we could add extra modules that would implement Cache interface using guava/caffeine/mapdb cache implementations.

I'm willing to contribute :-)

References:

Jun 15 '17 10:06 bfg

Here be dragons!

I'm willing to contribute :-)

Sure, go ahead! Feel free to ping here for discussing design.

Thanks!

Jun 15 '17 11:06 slandelle

Caffeine added support for variable expiration (see Expiry). This way entries can differ by their expiration times rather than relying on uniform, fixed timeout. If you are caching the responses then that would hold the expiration settings, which you would only evaluate on creation, and might look like:

LoadingCache<Request, Response> graphs = Caffeine.newBuilder()
    .expireAfter(new Expiry<Request, Response>() {
      public long expireAfterCreate(Request request, Response response, long currentTime) {
        long seconds = request.isCachable() ? request.getMaxAge() : 0L;
        return TimeUnit.SECONDS.toNanos(seconds);
      }
      public long expireAfterUpdate(Request request, Response response,
          long currentTime, long currentDuration) {
        return currentDuration;
      }
      public long expireAfterRead(Request request, Response response,
          long currentTime, long currentDuration) {
        return currentDuration;
      }
    })
    .build(request -> httpClient.call(request));

The feature is O(1) without relying on a maximum size, so you shouldn't observe much of a performance impact. Others use O(lg n) heaps or hide the entry and expect a maximum size be set (to lazily evict, e.g. redis/memcached). We do this using a timer wheel, like kernels do for timers.

Jun 15 '17 21:06 ben-manes

@ben-manes yeah, that is nice, but i'm planning to use different design. I don't want to introduce parallel AHC api, but i want to add cache at AHC instance creation time, just okhttp and apache httpcomponents do.

And i want to make it cache implementation independent; and yes, cache implementation should be async as well in case somebody want's to plug in redis/hazecalst/whatever.

Jun 16 '17 09:06 bfg

Right, I only meant it as an example that the feature was there. You could use a manual or async cache and compute using cache.get(key, func) or (a racy) cache.put(key, value). I didn't meant to imply that your strategy shouldn't be pluggable. Just that the capability is natively supported, whereas I think in Guava / MapDB you would need to emulate it using eviction, check the timestamp, and miss if expired.

Jun 16 '17 17:06 ben-manes

Also see https://github.com/playframework/playframework/pull/7255

Dec 12 '17 20:12 wsargent

@wsargent thanks for pinging. Where is EffectiveURIKey defined? I can't find it. I'm a bit skeptical if you key only account for the URI. You need to take the full request entity into account: method + URI + headers + request body.

Dec 12 '17 20:12 slandelle

@slandelle https://github.com/playframework/play-ws/blob/master/play-ahc-ws-standalone/src/main/scala/play/api/libs/ws/ahc/cache/EffectiveURIKey.scala

You will also be interested in https://github.com/playframework/cachecontrol/blob/master/src/main/scala/com/typesafe/play/cachecontrol/CacheControl.scala

Dec 14 '17 00:12 wsargent

You need to take the full request entity into account: method + URI + headers + request body.

the effective request uri is as defined in https://tools.ietf.org/html/rfc7230#section-5.5, from https://tools.ietf.org/html/rfc7234

Dec 14 '17 00:12 wsargent

I won’t be able to check the other RFCs before a few days but this algorithm looks broken to me. It’s pretty common that servers response depends on request headers (Accept, Accept-Language, etc).

Dec 14 '17 06:12 slandelle

Yes, a response has to be selected that matches the request header -- cached request and stored response is a huge part of the spec. That's covered in the ResponseSelectionCalculator.

Also the caching section of play-ws https://github.com/playframework/play-ws#caching and the cachecontrol README has more.

Dec 14 '17 14:12 wsargent

any updates on this? I'm currently implementing a Amazon SNS endpoint for a customer where we have to fetch certificates every time we receive a request (to verify its content) and being able to seamlessly cache this particular request (without spinning up an own caching solution) would be really nice.

Jun 11 '18 15:06 domdorn

any updates on this?

Nope. Contributions welcome.

Jun 11 '18 16:06 slandelle

async-http-client async-http-client copied to clipboard

Caching support

async-http-client
async-http-client copied to clipboard