Study-Notes TensorFlow使用技巧

TensorFlow使用技巧

Open ccc013 opened this issue 5 years ago • 0 comments

参考自：

有哪些相见恨晚的 TensorFlow 小技巧？

1. 使用 timeline 来优化优化性能

timeline可以分析整个模型在forward和backward的时候,每个操作消耗的时间，由此可以针对性的优化耗时的操作。我之前尝试使用 tensorflow 多卡来加速训练的时候，最后发现多卡速度还不如单卡快，改用tf.data来加速读图片还是很慢，最后使用imeline分析出了速度慢的原因，timeline的使用如下

run_metadata = tf.RunMetadata()
run_options = tf.RunOptions(trace_level=tf.RunOptions.FULL_TRACE)
config = tf.ConfigProto(graph_options=tf.GraphOptions(
        optimizer_options=tf.OptimizerOptions(opt_level=tf.OptimizerOptions.L0)))
with tf.Session(config=config) as sess:
    c_np = sess.run(c,options=run_options,run_metadata=run_metadata)
    tl = timeline.Timeline(run_metadata.step_stats)
    ctf = tl.generate_chrome_trace_format()
with open('timeline.json','w') as wd:
    wd.write(ctf)

然后到谷歌浏览器中打卡chrome://tracing 并导入 timeline.json ，最后可以看得如下图所示的每个操作消耗的时间.

这里横坐标为时间，从左到右依次为模型一次完整的forward and backward过程中，每个操作分别在cpu,gpu 0, gpu 1上消耗的时间，这些操作可以放大，非常方便观察具体每个操作在哪一个设备上消耗多少时间。这里我们 cpu 上主要有 QueueDequeue 操作，这是进行图片预期过程，这个时候 gpu 在并行计算的所以 gpu 没有空等；另外我的模型还有一个 PyFunc 在 cpu 上运行，如红框所示，此时 gpu 在等这个结果，没有任何操作运行，这个操作应该要优化的。另外就是如黑框所示，gpu上执行的时候有很大空隙，如黑框所示，这个导致gpu上的性能没有很好的利用起来，最后分析发现是我bn在多卡环境下没有使用正确，bn有一个参数updates_collections我设置为None 这时bn的参数 mean,var 是立即更新的，也是计算完当前 layer 的 mean,var 就更新，然后进行下一个 layer 的操作，这在单卡下没有问题的，但是多卡情况下就会写等读的冲突，因为可能存在gpu0更新（写）mean 但此时gpu1还没有计算到该层，所以gpu0就要等gpu1读完 mean 才能写，这样导致了如黑框所示的空隙，这时只需将参数设置成updates_collections=tf.GraphKeys.UPDATE_OPS 即可，表示所以的 bn 参数更新由用户来显示指定更新，如

update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
  with tf.control_dependencies(update_ops):
    train_op = optimizer.minimize(loss)

这样可以在每个卡forward完后，再更新bn参数，此时的写写不存在冲突。优化后，我的2卡训练获得了接近2倍的加速比。

2. 检查 NaN 方法

使用check= tf.add_check_numerics_ops,sess.run([check, ...])来检查NaN问题,该操作会报告所有出现NaN的操作，从而方便找到NaN的源头。

3. 设置环境变量`CUDA_VISIBLE_DEVICES`选择GPU

# 指定哪张卡：0,1,...
os.environ['CUDA_VISIBLE_DEVICES'] = '0'

4. freeze 某层参数

直接通过变量名筛选实现固定某层参数：

def get_train_op(loss, lr=0.01):    
    optimizer = tf.train.AdamOptimizer(lr)    
    tvars = tf.trainable_variables()
    # 通过变量名筛选
    tvars = [v for v in tvars if 'frozen' not in v.name]    
    grads = tf.gradients(loss, tvars)    
    train_op = optimizer.apply_gradients(zip(grads, tvars), global_step=tf.train.get_global_step())    return train_op

5. 关于`map_fn`

map_fn调用了tf.while_loop，使得并行计算非常困难，这可以通过timeline 的 profile 发现。

为了避免低效的map_fn，在一些问题中，如果希望对某个维度进行map的操作，可以先对原tensor进行reshape，将需要map的维度融入到batch的维度，操作完成后再reshape回来。

6. 联合使用`pdb`与`.get_shape()`快速调试模型

对于调试如下代码：

embeddings = tf.Variable(tf.random_uniform([50000, 64], -1.0, 1.0))
import pdb; 
pdb.set_trace()
embed = tf.nn.embedding_lookup(embeddings, inputs)

可以考虑创建embedding后加入断点，这样方便在pdb中自由检测各个变量的维度。

pdb>> intputs.get_shape()
pdb>> embed.get_shape()

这样做的好处就是，可以对graph中各个变量的形状形成一个非常具体的理解，并且可以在pdb中对各个tensor进行变换，比如转置等操作。

7. 读取`.event`文件

当你有需要从.events文件里读训练过程的各种loss指标，以及evaluation中各种recall,precision,auc时可以用以下代码读取（改自tensorflow实现earlyStop功能的部分源码）

import os
import collections
import tensorflow as tf

_EVENT_FILE_GLOB_PATTERN = 'events.out.tfevents.*'

def _summaries(eval_dir):
  """Yields `tensorflow.Event` protos from event files in the eval dir.
  Args:
    eval_dir: Directory containing summary files with eval metrics.
  Yields:
    `tensorflow.Event` object read from the event files.
  """
  if tf.gfile.Exists(eval_dir):
    for event_file in tf.gfile.Glob(
        os.path.join(eval_dir, _EVENT_FILE_GLOB_PATTERN)):
      print(event_file)
      for event in tf.train.summary_iterator(event_file):
        yield event

def read_eval_metrics(eval_dir):
  """Helper to read eval metrics from eval summary files.
  Args:
    eval_dir: Directory containing summary files with eval metrics.
  Returns:
    A `dict` with global steps mapping to `dict` of metric names and values.
  """
  eval_metrics_dict = {}
  for event in _summaries(eval_dir):
    if not event.HasField('summary'):
      continue
    metrics = {}
    for value in event.summary.value:
      if value.HasField('simple_value'):
        metrics[value.tag] = value.simple_value
    if metrics:
      eval_metrics_dict[event.step] = metrics
  return collections.OrderedDict(
      sorted(eval_metrics_dict.items(), key=lambda t: t[0]))


if __name__ == '__main__':
    dir = '/tmp/eval/'
    eval_results = read_eval_metrics(dir)
    for step, metrics in eval_results.items():
        print(step, metrics)

May 26 '19 06:05 ccc013

Study-Notes Study-Notes copied to clipboard

TensorFlow使用技巧

1. 使用 timeline 来优化优化性能

2. 检查 NaN 方法

3. 设置环境变量CUDA_VISIBLE_DEVICES选择GPU

4. freeze 某层参数

5. 关于map_fn

6. 联合使用pdb与.get_shape()快速调试模型

7. 读取.event文件

Study-Notes
Study-Notes copied to clipboard

3. 设置环境变量`CUDA_VISIBLE_DEVICES`选择GPU

5. 关于`map_fn`

6. 联合使用`pdb`与`.get_shape()`快速调试模型

7. 读取`.event`文件