tf_container.py modification for running GPU context
would like to start GPU session in container, and tried the following:
if (len(frozen_graph_exists) == 0): with tf.Graph().as_default() as graph: self.sess = tf.Session(graph=graph) loader.load(self.sess, [tf.saved_model.tag_constants.SERVING], os.path.join(path, "tfmodel")) else: self.sess = tf.Session( '', tf.Graph(), config=tf.ConfigProto( allow_soft_placement=True, log_device_placement=True)) metagraph_path = glob.glob(os.path.join(path, "tfmodel/*.meta"))[0] checkpoint_path = metagraph_path.split(".meta")[0] with tf.device("/gpu:0"): with self.sess.graph.as_default(): saver = tf.train.import_meta_graph( metagraph_path, clear_devices=True) saver.restore(self.sess, checkpoint_path)
so during line 3, we might want to load it into GPU using with tf.device("/gpu:0"):
Not sure if this should be a separate issue for greater focus, but how might we control each model replica's per_process_gpu_memory_fraction?
For instance, if I want to stand up 3 replicas of my model on a specified GPU, how might I dynamically allocate it s.t. each replica gets ~33% GPU memory allocated? And would this be tough to do without breaking the functionality of set_num_replicas?