CosFace icon indicating copy to clipboard operation
CosFace copied to clipboard

Training code should be modified if multiple GPUs are used

Open alexwongdl opened this issue 6 years ago • 2 comments

When I use four GPUs to training cosface model, exception occurs:

ValueError: Variable conv1_/conv2d/kernel already exists, disallowed. Did you mean to set reuse=True or reuse=tf.AUTO_REUSE in VarScope? Originally defined at:

  File "networks/sphere_network.py", line 49, in first_conv
    network = tf.layers.conv2d(input, num_output, kernel_size = [3, 3], strides = (2, 2), padding = 'same', kernel_initializer = xavier, bias_initializer = zero_init, kernel_regularizer = l2_regularizer, bias_regularizer = l2_regularizer)
  File "networks/sphere_network.py", line 14, in infer
    network = first_conv(input, 64, name = 'conv1')
  File "train/train_multi_gpu.py", line 197, in main
    prelogits = network.infer(batch_image_split[i], args.embedding_size)

prelogits = network.infer(batch_image_split[i],args.embedding_size) construct graph for every GPU and there is no resue setting. It should be modified to that : with tf.variable_scope(name_or_scope='', reuse=tf.AUTO_REUSE): prelogits = network.infer(batch_image_split[i],args.embedding_size)

alexwongdl avatar Aug 06 '18 07:08 alexwongdl

If you want to use multiple gpus to train the model, you can switch NETWORK=sphere_network to NETWORK=resface in train.sh. The resface is the implementation for multiple gpus. I just find the accuracy of sphere_network is more better than that of resface.

yule-li avatar Aug 11 '18 03:08 yule-li

@AlexWang90 First of all, thank you and the author yule-li ! Is it possible to perform multi-GPU training only by modifying this part? Looking forward to your reply.

chenyyx avatar Aug 28 '18 03:08 chenyyx