clipper
clipper copied to clipboard
Query frontend cache is inconsistent across replicas
Because predictions are cached within the query frontend, horizontally scaling the frontend will result in frontend replicas having different prediction cache contents. This motivates decoupling the prediction cache from the query frontend, which is also beneficial because it ensures that query frontend replicas are not storing large volumes of prediction data directly. Exploring the use of a distributed key-value store is likely a good idea.
Agreed. Using something like memcached is likely the right design choice here. As @dbczumar points out, the query frontend is pseudo-stateful right now, which is not a desirable property of the system.
Agreed. Our team set QueryFrontend's cache size to 1 to disable caching, and some API servers in front of Clipper are caching the prediction result of each input.
@simon-mo @withsmilo I think it is worth handling in the future release. I will mark with feature request.
@withsmilo Setting cache size to 1 only works for docker environment. Did you check this in the k8s, too?
@nopanderer I will check it ASAP.
@nopanderer Our team checked your issue in detail on our clipper-k8s cluster, but couldn't find any problem about disabled caching. When we set query-frontend's cache size to 1, all the model pods always received user requests. Could you check it again?
@withsmilo Please try this.
-
Register the model A named "tensorflow-alpha" which gives the binary output, 0 or 1.
-
Predict with model A and input.jpg. This should gives binary output. No wonder.
-
Then, unregister the model A and register the model B named "tensorflow-alpha" which produces float output, [0, 1]. Note that the model A and the model B have the same name.
-
And this is the point where the bad thing happens. If you predict with model B and input.jpg (input image with the same name used in step 2), the model B is supposed to give [0, 1] but gives the same output in the step 2.
In short, ("tensorflow-alpha", "input.jpg") always give the same output, even if the "tensorflow-alpha" is changed to another model.
When I tried this case in both Docker and Kubernetes where cache_size=1, the former gives right answer but the latter doesn't.
@nopanderer
- What version of Clipper are you using?
- What Clipper APIs did you call when unregistering the model A?
@withsmilo
Sorry for late reply. I was testing both 0.3 and 0.4, and found that the cache_size=1 works correctly in clipper:0.4. How did you fix this cache problem? Which part should I look into?
@nopanderer Good news! There are many differences between the two versions, so I'm not sure which part is the cause. See https://github.com/ucbrise/clipper/releases/tag/v0.4.1.
@withsmilo
Thank you a lot.
@nopanderer @withsmilo
Query frontend manages its own cache inside the server, and I guess the cache entry wasn’t evicted when model A was unregistered. As a result, when input.jpg was requested, the query frontend returns the cache entry (which is the output of model A).
If it’s the cause not sure how we can fix it. If your app frequently unregisters models, I recommend you to have a distributed caching layer on the top of query frontend instead