clipper Query frontend cache is inconsistent across replicas

Because predictions are cached within the query frontend, horizontally scaling the frontend will result in frontend replicas having different prediction cache contents. This motivates decoupling the prediction cache from the query frontend, which is also beneficial because it ensures that query frontend replicas are not storing large volumes of prediction data directly. Exploring the use of a distributed key-value store is likely a good idea.

Jun 06 '18 21:06 dbczumar

Agreed. Using something like memcached is likely the right design choice here. As @dbczumar points out, the query frontend is pseudo-stateful right now, which is not a desirable property of the system.

Jun 13 '18 22:06 dcrankshaw

Agreed. Our team set QueryFrontend's cache size to 1 to disable caching, and some API servers in front of Clipper are caching the prediction result of each input.

Jun 18 '18 23:06 withsmilo

@simon-mo @withsmilo I think it is worth handling in the future release. I will mark with feature request.

May 29 '19 03:05 rkooo567

@withsmilo Setting cache size to 1 only works for docker environment. Did you check this in the k8s, too?

Sep 09 '19 07:09 nopanderer

@nopanderer I will check it ASAP.

Sep 10 '19 00:09 withsmilo

@nopanderer Our team checked your issue in detail on our clipper-k8s cluster, but couldn't find any problem about disabled caching. When we set query-frontend's cache size to 1, all the model pods always received user requests. Could you check it again?

Sep 10 '19 08:09 withsmilo

@withsmilo Please try this.

Register the model A named "tensorflow-alpha" which gives the binary output, 0 or 1.
Predict with model A and input.jpg. This should gives binary output. No wonder.
Then, unregister the model A and register the model B named "tensorflow-alpha" which produces float output, [0, 1]. Note that the model A and the model B have the same name.
And this is the point where the bad thing happens. If you predict with model B and input.jpg (input image with the same name used in step 2), the model B is supposed to give [0, 1] but gives the same output in the step 2.

In short, ("tensorflow-alpha", "input.jpg") always give the same output, even if the "tensorflow-alpha" is changed to another model.

When I tried this case in both Docker and Kubernetes where cache_size=1, the former gives right answer but the latter doesn't.

Sep 10 '19 09:09 nopanderer

@nopanderer

What version of Clipper are you using?
What Clipper APIs did you call when unregistering the model A?

Sep 11 '19 02:09 withsmilo

@withsmilo

Sorry for late reply. I was testing both 0.3 and 0.4, and found that the cache_size=1 works correctly in clipper:0.4. How did you fix this cache problem? Which part should I look into?

Sep 18 '19 00:09 nopanderer

@nopanderer Good news! There are many differences between the two versions, so I'm not sure which part is the cause. See https://github.com/ucbrise/clipper/releases/tag/v0.4.1.

Sep 18 '19 00:09 withsmilo

@withsmilo

Thank you a lot.

Sep 18 '19 00:09 nopanderer

@nopanderer @withsmilo

Query frontend manages its own cache inside the server, and I guess the cache entry wasn’t evicted when model A was unregistered. As a result, when input.jpg was requested, the query frontend returns the cache entry (which is the output of model A).

Sep 18 '19 01:09 rkooo567

If it’s the cause not sure how we can fix it. If your app frequently unregisters models, I recommend you to have a distributed caching layer on the top of query frontend instead

Sep 18 '19 01:09 rkooo567

clipper clipper copied to clipboard

Query frontend cache is inconsistent across replicas

clipper
clipper copied to clipboard