Pedram Pejman
Pedram Pejman
Also running into this issue. It seems instantiation a Connector.ConnectionListener has, without any code changes on the user's part, changed behavior (I only started seeing this issue today).
There's been a bunch of documentation published by TF ([ex](https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md)). More than providing more solutions, this issue aims to be used as a means of recording problems of TFX users...
Hey @kimjuny the use case definitely makes sense. There's nothing on the roadmap currently addressing this need but I'll leave this open to see if other folks develop the same...
Automatically unloading models subject to some policy (LRU as you mention or after a set period of inactivity) is something that we've thought about but no concrete plans to add...
A note on the "lazily load models": We have a prototype [caching_manager](https://github.com/tensorflow/serving/blob/master/tensorflow_serving/core/caching_manager.h) that handles the lazy loading of the models at inference time. What this feature request tracks is the...
@guillaumekln thanks for the comment. Could you share a pointer to this server? I'm curious to understand * if the delta between system and device memory limits indeed make a...
Adding a note to this issue that several users at 2019 Oreilly AI conference also requested this feature.
what happens when you load up the model with TF? Do you get significantly better inference latency? your TF runtime requires X time to do a forward pass on your...
Hi there, we can easily export metrics that tell you host memory consumption on a per model basis but I think you're specifically looking for GPU's memory consumption/availability correct? This...
@betterchen we're looking into seeing why we don't run into this problem internally and what the fix would look like. Thank you for reporting this.