bi-att-flow copied to clipboard
Variable model_1/loss/ExponentialMovingAverage/ does not exist
HI, I'm running the dev branch code on Tensorflow 1.2.
And I got this error: Variable model_1/loss/ExponentialMovingAverage/ does not exist or was not created with tf.get_variable(). Did you mean to set reuse=None in VarScope.
From the stack trace, it was from basic/, in _build_ema, ema_op=ema.apply(tensors).
I tried to add "with tf.variable_scope(tf.get_variable_scope(), reuse=False):" before eam.apply but that still doesn't work.
Any ideas how can I fix this?
Thanks! I'm using CUDA8.0 and Cudnn5.1. Tensorflow v1.2, python 3.5.
I found this problem only occurs in multi-GPU training. It's fine to use the same code without --num_gpus>1.
@demiguo I have same issue, had you solved the problem?
@demiguo I have the same issue. Because of this I can't train on a multiple GPU setup. Has anyone solved the problem?
I have the same problem
Anyone found anything?
Same issue occurs on version patched to run on TF r.17 Note the following version works on 1 gpu but not on >1 gpu
Traceback (most recent call last):
File "/usr/lib/python3.5/", line 184, in _run_module_as_main
"main", mod_spec)
File "/usr/lib/python3.5/", line 85, in _run_code
exec(code, run_globals)
File "/home/levinth/bi-att-flow-zt1/basic/", line 128, in
Hi David,
My name is Tian. I'm moving our discussion from email to this issue ticket so that other developers who have issues with this error could see it.
The issue with the original bidaf implementation is that it only creates one loss variable for the model. On a single GPU this is fine, because you only have one model. In a multi-GPU setting, the way bidaf implements multi-GPU training is that it replicates the model for every GPU device, and assigns one model on one device. This means that each model would require its own loss variable. If the developer only specifies one loss variable, tensorflow would try to reuse the loss variable for every model, which would create a conflict.
For example, in your error, if you only have one device, the name of the loss variable would be model_0/loss/ExponentialMovingAverage. If you have two devices, another loss variable called model_1/loss/ExponentialMovingAverage would be referenced by tensorflow. Since this variable is not created before you generate the whole model, tensorflow would try to reuse the variable you previously generated for model_0. Does that make sense?
The solution to resolve this conflict is by creating a loss variable for every model that's replicated:
Unfortunately I don't have a mult-GPU node available. Would you mind try this patch on your node and see if it works?
I will modify the Andreas Klintberg fork of that is the code base that works on top of tree TF..nothing else does due to the change in the handling of flags
but I have to wait until the 4 GPU machine I set this up on gets freed up... I don't want to build everything again LOL d
On Tue, May 1, 2018 at 2:58 PM, kelayamatoz [email protected] wrote:
Hi David,
My name is Tian. I'm moving our discussion from email to this issue ticket so that other developers who have issues with this error could see it.
The issue with the original bidaf implementation is that it only creates one loss variable for the model. On a single GPU this is fine, because you only have one model. In a multi-GPU setting, the way bidaf implements multi-GPU training is that it replicates the model for every GPU device, and assigns one model on one device. This means that each model would require its own loss variable. If the developer only specifies one loss variable, tensorflow would try to reuse the loss variable for every model, which would create a conflict.
For example, in your error, if you only have one device, the name of the loss variable would be model_0/loss/ExponentialMovingAverage. If you have two devices, another loss variable called model_1/loss/ExponentialMovingAverage would be referenced by tensorflow. Since this variable is not created before you generate the whole model, tensorflow would try to reuse the variable you previously generated for model_0. Does that make sense?
The solution to resolve this conflict is by creating a loss variable for every model that's replicated: blob/master/tensorflow/SQuAD/basic/
Unfortunately I don't have a mult-GPU node available. Would you mind try this patch on your node and see if it works?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread .
new distro is
On Tue, May 1, 2018 at 3:45 PM, David Levinthal [email protected] wrote:
I will modify the Andreas Klintberg fork of that is the code base that works on top of tree TF..nothing else does due to the change in the handling of flags
but I have to wait until the 4 GPU machine I set this up on gets freed up... I don't want to build everything again LOL d
On Tue, May 1, 2018 at 2:58 PM, kelayamatoz [email protected] wrote:
Hi David,
My name is Tian. I'm moving our discussion from email to this issue ticket so that other developers who have issues with this error could see it.
The issue with the original bidaf implementation is that it only creates one loss variable for the model. On a single GPU this is fine, because you only have one model. In a multi-GPU setting, the way bidaf implements multi-GPU training is that it replicates the model for every GPU device, and assigns one model on one device. This means that each model would require its own loss variable. If the developer only specifies one loss variable, tensorflow would try to reuse the loss variable for every model, which would create a conflict.
For example, in your error, if you only have one device, the name of the loss variable would be model_0/loss/ExponentialMovingAverage. If you have two devices, another loss variable called model_1/loss/ExponentialMovingAverage would be referenced by tensorflow. Since this variable is not created before you generate the whole model, tensorflow would try to reuse the variable you previously generated for model_0. Does that make sense?
The solution to resolve this conflict is by creating a loss variable for every model that's replicated: b/master/tensorflow/SQuAD/basic/
Unfortunately I don't have a mult-GPU node available. Would you mind try this patch on your node and see if it works?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread .
took from the dawn distribution and added it to my modified (for printout and speed logging) version of the klintberg distro It appears to not actually pay attention to the num_gpus flag and started 4 processes on the 4 V100s used CUDA_VISIBLE_DEVICES=0 and num_gpus=1 default batch size global_step: 100, avg_loss = 8.443472, time = 336.919500
global_step: 200, avg_loss = 7.691555, time = 333.118028
global_step: 300, avg_loss = 7.293336, time = 332.794547
global_step: 400, avg_loss = 6.585279, time = 330.941432 nvidia-smi showed 1 process
unsetting CUDA_VISIBLE_DEVICES and rerunning one see 4 processes in nvidia-smi..but only 1 GPU being active :-) global_step: 100, avg_loss = 8.414022, time = 353.267652
global_step: 200, avg_loss = 7.680358, time = 343.982275
global_step: 300, avg_loss = 7.316520, time = 346.749630
global_step: 400, avg_loss = 6.531937, time = 344.386899
set CUDA_VISIBLE_DEVICES=0,1,2,3 and num_gpus=4 global_step: 100, avg_loss = 8.122040, time = 669.353025
global_step: 200, avg_loss = 6.920084, time = 651.585218
global_step: 300, avg_loss = 5.956660, time = 648.792006
global_step: 400, avg_loss = 5.126814, time = 650.647822
global_step: 500, avg_loss = 4.219713, time = 648.755034
so I am bit unsure exactly whether things are going faster when fanned out lowering the batch size to 15 while running on 4 GPUs does not change the output much global_step: 100, avg_loss = 8.417717, time = 518.246054
global_step: 200, avg_loss = 7.639581, time = 492.312914
global_step: 300, avg_loss = 7.274171, time = 493.591461
global_step: 400, avg_loss = 6.699906, time = 506.484376
On Wed, May 2, 2018 at 1:29 PM, David Levinthal [email protected] wrote:
new distro is
On Tue, May 1, 2018 at 3:45 PM, David Levinthal < [email protected]> wrote:
I will modify the Andreas Klintberg fork of that is the code base that works on top of tree TF..nothing else does due to the change in the handling of flags
but I have to wait until the 4 GPU machine I set this up on gets freed up... I don't want to build everything again LOL d
On Tue, May 1, 2018 at 2:58 PM, kelayamatoz [email protected] wrote:
Hi David,
My name is Tian. I'm moving our discussion from email to this issue ticket so that other developers who have issues with this error could see it.
The issue with the original bidaf implementation is that it only creates one loss variable for the model. On a single GPU this is fine, because you only have one model. In a multi-GPU setting, the way bidaf implements multi-GPU training is that it replicates the model for every GPU device, and assigns one model on one device. This means that each model would require its own loss variable. If the developer only specifies one loss variable, tensorflow would try to reuse the loss variable for every model, which would create a conflict.
For example, in your error, if you only have one device, the name of the loss variable would be model_0/loss/ExponentialMovingAverage. If you have two devices, another loss variable called model_1/loss/ExponentialMovingAverage would be referenced by tensorflow. Since this variable is not created before you generate the whole model, tensorflow would try to reuse the variable you previously generated for model_0. Does that make sense?
The solution to resolve this conflict is by creating a loss variable for every model that's replicated: b/master/tensorflow/SQuAD/basic/
Unfortunately I don't have a mult-GPU node available. Would you mind try this patch on your node and see if it works?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread .
Hi @demiguo. On tensorflow 1.12.0, I had the same problem and fixed it by adding the line:
with tf.variable_scope(tf.get_variable_scope(), reuse=tf.AUTO_REUSE):
before ema.apply
First I would like to give some context to this issue. It applies only when a different WorkerX/GPUX/etcX tf.name_scope() was created over different instantations of any model that uses tf.train.ExponentialMovingAverage (commonly used by Batch Normalization). If instead there had been used a "WorkerX" tf.variable_scope(), there would be no possibility of reuse, because variables created with tf.get_variable() only ignore tf.name_scopes(). Thus using tf.get_variable() inside different tf.variable_scope() can only be different variables.
On the other hand, if there were no tf.name_scope() nor tf.variable_scope() over different gpu workers, variables being created with either tf.Variables or tf.get_variable would have exactly the same scope, giving both the possibility of being properly reused, but probably not creating a very pretty underlying graph, because operations would not be agregated over workers (esthetical/design/maintenance issue).
But as I understand, using with tf.variable_scope(tf.get_variable_scope(), reuse=tf.AUTO_REUSE) before ema.apply, as @shimafoolad suggests, will drop reuse of loss/ExponentialMovingAverage/ across GPUs, and any shadow variable created with tf.Variable by ema.apply. That would be bad for the learning of Batch Normalization layers in distributed learning, which does not seem to be a good solution.
Maybe there is a way in which main variables of BN layers would be reused anyway, but I have found no explanation of such mechanism and maybe this issue would be solved with such an explanation.
Thougth it again and realized that trainable parámeters of Batch Normalization can be defined with tf.get_variable() and use "with tf.variable_scope(tf.get_variable_scope(), reuse=tf.AUTO_REUSE)" as @shimafoolad says, as follows:
def batch_norm_template(inputs, is_training, scope,
moments_dims, bn_decay, reuse):
with tf.variable_scope(scope, reuse=reuse) as sc:
num_channels = inputs.get_shape()[-1].value
beta = tf.get_variable('beta', None, None,
tf.constant(0.0, tf.float32, [num_channels]), None, True)
gamma = tf.get_variable('gamma', None, None,
tf.constant(1.0, tf.float32, [num_channels]), None, True)
batch_mean, batch_var = tf.nn.moments(inputs,
moments_dims, name='moments')
decay = bn_decay if bn_decay is not None else 0.9
ema = tf.train.ExponentialMovingAverage(decay=decay)
# Operator that maintains moving averages of variables.
with tf.variable_scope(tf.get_variable_scope(),
ema_apply_op = tf.cond(is_training,
lambda: ema.apply([batch_mean, batch_var]),
lambda: tf.no_op())
# Update moving average, return current batch's avg and var.
def mean_var_with_update():
with tf.control_dependencies([ema_apply_op]):
return tf.identity(batch_mean), tf.identity(batch_var)
# ema.average returns the Variable holding the average of var.
mean, var = tf.cond(is_training,
lambda: (ema.average(batch_mean), ema.average(batch_var)))
normed = tf.nn.batch_normalization(inputs, mean, var,
beta, gamma, 1e-3)
return normed
For safety, reuse should be False the first call and True all after, as done in
Hope it helps, especially those who update legacy tensorflow