PIE icon indicating copy to clipboard operation
PIE copied to clipboard

can it use multi gpus to train ?

Open wangwang110 opened this issue 4 years ago • 3 comments

wangwang110 avatar Dec 18 '19 10:12 wangwang110

Yes.
My method is using tf.contrib.distribute in tensorflow-gpu 1.13 I met some problems with this method, spent several days and finally successfully train PIE on multi GPUs. So you can use other methods if you find them convenient. Below is part of my code in word_edit_model.py, coping them may lead bugs because they are not the whole codes I changed in PIE.

from tensorflow.python.estimator.run_config import RunConfig
from tensorflow.python.estimator.estimator import Estimator
from tensorflow.contrib.distribute import AllReduceCrossDeviceOps
# ...
dist_strategy = tf.contrib.distribute.MirroredStrategy(
      num_gpus=FLAGS.n_gpus,
      cross_device_ops=AllReduceCrossDeviceOps('nccl', num_packs=FLAGS.n_gpus),
      # cross_device_ops=AllReduceCrossDeviceOps('hierarchical_copy'),
  )
  session_config = tf.ConfigProto(
      inter_op_parallelism_threads=0,
      intra_op_parallelism_threads=0,
      allow_soft_placement=True,
      gpu_options=tf.GPUOptions(allow_growth=True))

  run_config = RunConfig(
      train_distribute=dist_strategy,
      eval_distribute=dist_strategy,
      model_dir=FLAGS.output_dir,
      session_config=session_config,
      save_checkpoints_steps=FLAGS.save_checkpoints_steps,
      keep_checkpoint_max=15,
      )

Serenade-J avatar Dec 18 '19 11:12 Serenade-J

All the documents I referred to (for training PIE with multi GPUs) can be found online.

Serenade-J avatar Dec 18 '19 11:12 Serenade-J

All the documents I referred to (for training PIE with multi GPUs) can be found online.

Hi, could you share the whole codes you changed in word_edit_model.py? Thanks.

binhetech avatar Feb 07 '20 03:02 binhetech