stylable
stylable copied to clipboard
Does TensorFlow1x support asycn-training?
Dear All, Does TensorFlow1x support asycn-training? I tried BytePS asycn-training with tensorflow mnist example. After one batch update with server, weights becomes zeros in worker.
@ymjiang
@ymjiang Hi,haibin and yimin I have two problems in asy-training The first one is that delta_w sended to severs are zeros all the time. It seems old_tensors changed as vars changed in tensorflow/_init_.py.
def apply_gradients(self, *args, **kwargs):
"""Calls this same method on the underlying optimizer."""
if self._enable_async: # async training
grads_and_vars = args[0]
_, vars = zip(*grads_and_vars)
old_tensors = []
for var in vars:
old_tensors.append(tf.convert_to_tensor(var))
apply_ops = self._optimizer.apply_gradients(*args, **kwargs)
with tf.control_dependencies([apply_ops]):
# get the delta
for i, var in enumerate(vars):
old_tensors[i] = tf.subtract(var, old_tensors[i])
# reuse the _push_pul_grads(), but is transferring parameters
updated_tensors = self._push_pull_grads(old_tensors)
# copy the updated variable back
assign_op_list = []
for i, tensor in enumerate(updated_tensors):
assign_op_list.append(tf.assign(vars[i], tensor))
return control_flow_ops.group(*assign_op_list)
else:
return self._optimizer.apply_gradients(*args, **kwargs)
The second one is the tensor's full name to be declared are different between broadcast section and training section. It seems the weight and delta_weight won't be summed because they have different declared key. Pls check def _push_pull(tensor, scope='', name=None)
def broadcast(tensor, root_rank, scope='', name=None, is_variable=True):
in ops.py.
If I missunderstood something, pls shed light on it. Thanks!