serving icon indicating copy to clipboard operation
serving copied to clipboard

Add latest_once version policy

Open zhuyijie opened this issue 3 years ago • 1 comments

Hi, we are trying to open source our training/serving framework based on tensorflow. It includes several patches to tensorflow and tensorflow serving. We want to eliminate those patches by merging reasonable changes to the official repo.

This PR introduces a new version policy which is called latest_once. Models using this policy only loads the latest version once and skips the later polling. This is similar to the latest policy with file_system_poll_wait_seconds=0, except that it is model level setting rather than process level. We are unable to do it on a process level because we want to serve multi models with different policies in the single instance.

The use case we are applying this is online training giant recommendation models(>10T), which mainly contains large sparse embedding tables. The framework mentioned above contains a dynamic embedding table, which support serving time insertion/deletion/updating. When a model is published to serving, it loads the latest version and listens on deltas of new updates. The benefits are reduced memory(only one version) and reduced gap between serving and training(because deltas are small and fast).

zhuyijie avatar Aug 13 '22 00:08 zhuyijie

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

google-cla[bot] avatar Aug 13 '22 00:08 google-cla[bot]