clipper tf_container.py modification for running GPU context

would like to start GPU session in container, and tried the following:

if (len(frozen_graph_exists) == 0): with tf.Graph().as_default() as graph: self.sess = tf.Session(graph=graph) loader.load(self.sess, [tf.saved_model.tag_constants.SERVING], os.path.join(path, "tfmodel")) else: self.sess = tf.Session( '', tf.Graph(), config=tf.ConfigProto( allow_soft_placement=True, log_device_placement=True)) metagraph_path = glob.glob(os.path.join(path, "tfmodel/*.meta"))[0] checkpoint_path = metagraph_path.split(".meta")[0] with tf.device("/gpu:0"): with self.sess.graph.as_default(): saver = tf.train.import_meta_graph( metagraph_path, clear_devices=True) saver.restore(self.sess, checkpoint_path)

Feb 12 '19 01:02 wcwang07

so during line 3, we might want to load it into GPU using with tf.device("/gpu:0"):

Feb 12 '19 01:02 simon-mo

Not sure if this should be a separate issue for greater focus, but how might we control each model replica's per_process_gpu_memory_fraction?

For instance, if I want to stand up 3 replicas of my model on a specified GPU, how might I dynamically allocate it s.t. each replica gets ~33% GPU memory allocated? And would this be tough to do without breaking the functionality of set_num_replicas?

Feb 13 '19 16:02 dsandii