clipper icon indicating copy to clipboard operation
clipper copied to clipboard

Query frontend cache is inconsistent across replicas

Open dbczumar opened this issue 7 years ago • 13 comments

Because predictions are cached within the query frontend, horizontally scaling the frontend will result in frontend replicas having different prediction cache contents. This motivates decoupling the prediction cache from the query frontend, which is also beneficial because it ensures that query frontend replicas are not storing large volumes of prediction data directly. Exploring the use of a distributed key-value store is likely a good idea.

dbczumar avatar Jun 06 '18 21:06 dbczumar

Agreed. Using something like memcached is likely the right design choice here. As @dbczumar points out, the query frontend is pseudo-stateful right now, which is not a desirable property of the system.

dcrankshaw avatar Jun 13 '18 22:06 dcrankshaw

Agreed. Our team set QueryFrontend's cache size to 1 to disable caching, and some API servers in front of Clipper are caching the prediction result of each input.

withsmilo avatar Jun 18 '18 23:06 withsmilo

@simon-mo @withsmilo I think it is worth handling in the future release. I will mark with feature request.

rkooo567 avatar May 29 '19 03:05 rkooo567

@withsmilo Setting cache size to 1 only works for docker environment. Did you check this in the k8s, too?

nopanderer avatar Sep 09 '19 07:09 nopanderer

@nopanderer I will check it ASAP.

withsmilo avatar Sep 10 '19 00:09 withsmilo

@nopanderer Our team checked your issue in detail on our clipper-k8s cluster, but couldn't find any problem about disabled caching. When we set query-frontend's cache size to 1, all the model pods always received user requests. Could you check it again?

withsmilo avatar Sep 10 '19 08:09 withsmilo

@withsmilo Please try this.

  1. Register the model A named "tensorflow-alpha" which gives the binary output, 0 or 1.

  2. Predict with model A and input.jpg. This should gives binary output. No wonder.

  3. Then, unregister the model A and register the model B named "tensorflow-alpha" which produces float output, [0, 1]. Note that the model A and the model B have the same name.

  4. And this is the point where the bad thing happens. If you predict with model B and input.jpg (input image with the same name used in step 2), the model B is supposed to give [0, 1] but gives the same output in the step 2.

In short, ("tensorflow-alpha", "input.jpg") always give the same output, even if the "tensorflow-alpha" is changed to another model.

When I tried this case in both Docker and Kubernetes where cache_size=1, the former gives right answer but the latter doesn't.

nopanderer avatar Sep 10 '19 09:09 nopanderer

@nopanderer

  1. What version of Clipper are you using?
  2. What Clipper APIs did you call when unregistering the model A?

withsmilo avatar Sep 11 '19 02:09 withsmilo

@withsmilo

Sorry for late reply. I was testing both 0.3 and 0.4, and found that the cache_size=1 works correctly in clipper:0.4. How did you fix this cache problem? Which part should I look into?

nopanderer avatar Sep 18 '19 00:09 nopanderer

@nopanderer Good news! There are many differences between the two versions, so I'm not sure which part is the cause. See https://github.com/ucbrise/clipper/releases/tag/v0.4.1.

withsmilo avatar Sep 18 '19 00:09 withsmilo

@withsmilo

Thank you a lot.

nopanderer avatar Sep 18 '19 00:09 nopanderer

@nopanderer @withsmilo

Query frontend manages its own cache inside the server, and I guess the cache entry wasn’t evicted when model A was unregistered. As a result, when input.jpg was requested, the query frontend returns the cache entry (which is the output of model A).

rkooo567 avatar Sep 18 '19 01:09 rkooo567

If it’s the cause not sure how we can fix it. If your app frequently unregisters models, I recommend you to have a distributed caching layer on the top of query frontend instead

rkooo567 avatar Sep 18 '19 01:09 rkooo567