caffe-jacinto icon indicating copy to clipboard operation
caffe-jacinto copied to clipboard

slower training speed

Open umeannthtome opened this issue 7 years ago • 10 comments

Hi,

I have compared the training speed between version 0.15 and 0.16, and found that 0.16 trains at a significantly slower speed with exactly the same settings. Why is that so?

Thanks.

William

umeannthtome avatar Nov 03 '17 08:11 umeannthtome

Hi,

Can you please share the prototxt of the model?

Thanks,

mathmanu avatar Nov 03 '17 08:11 mathmanu

Hi Manu,

Below is my train.prototxt.

I have tried to set both True and False for "in_place" parameter of bathnorm's but it doesn't seem to make a difference.

name: "train"
layer {
  name: "data"
  type: "ImageLabelListData"
  top: "data"
  top: "label"
  transform_param {
    mirror: true
    crop_size: 72
    mean_value: 0.0
  }
  image_label_data_param {
    image_list_path: "train_image_list.txt"
    label_list_path: "train_label_list.txt"
    batch_size: 128
    shuffle: true
    scale_prob: 0.5
    scale_min: 0.75
    scale_max: 1.25
    threads: 1
  }
}
layer {
  name: "data/bias"
  type: "Bias"
  bottom: "data"
  top: "data/bias"
  param {
    lr_mult: 0.0
    decay_mult: 0.0
  }
  bias_param {
    filler {
      type: "constant"
      value: -128.0
    }
  }
}
layer {
  name: "conv1a"
  type: "Convolution"
  bottom: "data/bias"
  top: "conv1a"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 0.0
  }
  convolution_param {
    num_output: 32
    bias_term: true
    pad: 1
    kernel_size: 3
    group: 1
    stride: 1
    weight_filler {
      type: "msra"
    }
    bias_filler {
      type: "constant"
      value: 0.0
    }
    dilation: 1
  }
}
layer {
  name: "conv1a/bn"
  type: "BatchNorm"
  bottom: "conv1a"
  top: "conv1a/bn"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 0.0
    decay_mult: 0.0
  }
  param {
    lr_mult: 0.0
    decay_mult: 0.0
  }
  batch_norm_param {
    moving_average_fraction: 0.990000009537
    eps: 9.99999974738e-05
    scale_filler {
      type: "constant"
      value: 1.0
    }
    bias_filler {
      type: "constant"
      value: 0.0
    }
  }
}
layer {
  name: "conv1a/relu"
  type: "ReLU"
  bottom: "conv1a/bn"
  top: "conv1a/bn"
}
layer {
  name: "down1"
  type: "Convolution"
  bottom: "conv1a/bn"
  top: "down1"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 0.0
  }
  convolution_param {
    num_output: 32
    bias_term: true
    pad: 1
    kernel_size: 3
    group: 4
    stride: 2
    weight_filler {
      type: "msra"
    }
    bias_filler {
      type: "constant"
      value: 0.0
    }
    dilation: 1
  }
}
layer {
  name: "down1/bn"
  type: "BatchNorm"
  bottom: "down1"
  top: "down1/bn"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 0.0
    decay_mult: 0.0
  }
  param {
    lr_mult: 0.0
    decay_mult: 0.0
  }
  batch_norm_param {
    moving_average_fraction: 0.990000009537
    eps: 9.99999974738e-05
    scale_filler {
      type: "constant"
      value: 1.0
    }
    bias_filler {
      type: "constant"
      value: 0.0
    }
  }
}
layer {
  name: "down1/relu"
  type: "ReLU"
  bottom: "down1/bn"
  top: "down1/bn"
}
layer {
  name: "conv2a"
  type: "Convolution"
  bottom: "down1/bn"
  top: "conv2a"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 0.0
  }
  convolution_param {
    num_output: 32
    bias_term: true
    pad: 2
    kernel_size: 3
    group: 1
    stride: 1
    weight_filler {
      type: "msra"
    }
    bias_filler {
      type: "constant"
      value: 0.0
    }
    dilation: 2
  }
}
layer {
  name: "conv2a/bn"
  type: "BatchNorm"
  bottom: "conv2a"
  top: "conv2a/bn"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 0.0
    decay_mult: 0.0
  }
  param {
    lr_mult: 0.0
    decay_mult: 0.0
  }
  batch_norm_param {
    moving_average_fraction: 0.990000009537
    eps: 9.99999974738e-05
    scale_filler {
      type: "constant"
      value: 1.0
    }
    bias_filler {
      type: "constant"
      value: 0.0
    }
  }
}
layer {
  name: "conv2a/relu"
  type: "ReLU"
  bottom: "conv2a/bn"
  top: "conv2a/bn"
}
layer {
  name: "conv3a"
  type: "Convolution"
  bottom: "conv2a/bn"
  top: "conv3a"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 0.0
  }
  convolution_param {
    num_output: 32
    bias_term: true
    pad: 2
    kernel_size: 3
    group: 4
    stride: 1
    weight_filler {
      type: "msra"
    }
    bias_filler {
      type: "constant"
      value: 0.0
    }
    dilation: 2
  }
}
layer {
  name: "conv3a/bn"
  type: "BatchNorm"
  bottom: "conv3a"
  top: "conv3a/bn"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 0.0
    decay_mult: 0.0
  }
  param {
    lr_mult: 0.0
    decay_mult: 0.0
  }
  batch_norm_param {
    moving_average_fraction: 0.990000009537
    eps: 9.99999974738e-05
    scale_filler {
      type: "constant"
      value: 1.0
    }
    bias_filler {
      type: "constant"
      value: 0.0
    }
  }
}
layer {
  name: "conv3a/relu"
  type: "ReLU"
  bottom: "conv3a/bn"
  top: "conv3a/bn"
}
layer {
  name: "res1_sum"
  type: "Eltwise"
  bottom: "down1/bn"
  bottom: "conv3a/bn"
  top: "res1_sum"
}
layer {
  name: "down2"
  type: "Convolution"
  bottom: "res1_sum"
  top: "down2"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 0.0
  }
  convolution_param {
    num_output: 64
    bias_term: true
    pad: 1
    kernel_size: 3
    group: 4
    stride: 2
    weight_filler {
      type: "msra"
    }
    bias_filler {
      type: "constant"
      value: 0.0
    }
    dilation: 1
  }
}
layer {
  name: "down2/bn"
  type: "BatchNorm"
  bottom: "down2"
  top: "down2/bn"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 0.0
    decay_mult: 0.0
  }
  param {
    lr_mult: 0.0
    decay_mult: 0.0
  }
  batch_norm_param {
    moving_average_fraction: 0.990000009537
    eps: 9.99999974738e-05
    scale_filler {
      type: "constant"
      value: 1.0
    }
    bias_filler {
      type: "constant"
      value: 0.0
    }
  }
}
layer {
  name: "down2/relu"
  type: "ReLU"
  bottom: "down2/bn"
  top: "down2/bn"
}
layer {
  name: "conv4a"
  type: "Convolution"
  bottom: "down2/bn"
  top: "conv4a"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 0.0
  }
  convolution_param {
    num_output: 64
    bias_term: true
    pad: 2
    kernel_size: 3
    group: 1
    stride: 1
    weight_filler {
      type: "msra"
    }
    bias_filler {
      type: "constant"
      value: 0.0
    }
    dilation: 2
  }
}
layer {
  name: "conv4a/bn"
  type: "BatchNorm"
  bottom: "conv4a"
  top: "conv4a/bn"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 0.0
    decay_mult: 0.0
  }
  param {
    lr_mult: 0.0
    decay_mult: 0.0
  }
  batch_norm_param {
    moving_average_fraction: 0.990000009537
    eps: 9.99999974738e-05
    scale_filler {
      type: "constant"
      value: 1.0
    }
    bias_filler {
      type: "constant"
      value: 0.0
    }
  }
}
layer {
  name: "conv4a/relu"
  type: "ReLU"
  bottom: "conv4a/bn"
  top: "conv4a/bn"
}
layer {
  name: "conv5a"
  type: "Convolution"
  bottom: "conv4a/bn"
  top: "conv5a"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 0.0
  }
  convolution_param {
    num_output: 64
    bias_term: true
    pad: 2
    kernel_size: 3
    group: 4
    stride: 1
    weight_filler {
      type: "msra"
    }
    bias_filler {
      type: "constant"
      value: 0.0
    }
    dilation: 2
  }
}
layer {
  name: "conv5a/bn"
  type: "BatchNorm"
  bottom: "conv5a"
  top: "conv5a/bn"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 0.0
    decay_mult: 0.0
  }
  param {
    lr_mult: 0.0
    decay_mult: 0.0
  }
  batch_norm_param {
    moving_average_fraction: 0.990000009537
    eps: 9.99999974738e-05
    scale_filler {
      type: "constant"
      value: 1.0
    }
    bias_filler {
      type: "constant"
      value: 0.0
    }
  }
}
layer {
  name: "conv5a/relu"
  type: "ReLU"
  bottom: "conv5a/bn"
  top: "conv5a/bn"
}
layer {
  name: "res2_sum"
  type: "Eltwise"
  bottom: "down2/bn"
  bottom: "conv5a/bn"
  top: "res2_sum"
}
layer {
  name: "conv6a"
  type: "Convolution"
  bottom: "res2_sum"
  top: "conv6a"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 0.0
  }
  convolution_param {
    num_output: 64
    bias_term: true
    pad: 2
    kernel_size: 3
    group: 1
    stride: 1
    weight_filler {
      type: "msra"
    }
    bias_filler {
      type: "constant"
      value: 0.0
    }
    dilation: 2
  }
}
layer {
  name: "conv6a/bn"
  type: "BatchNorm"
  bottom: "conv6a"
  top: "conv6a/bn"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 0.0
    decay_mult: 0.0
  }
  param {
    lr_mult: 0.0
    decay_mult: 0.0
  }
  batch_norm_param {
    moving_average_fraction: 0.990000009537
    eps: 9.99999974738e-05
    scale_filler {
      type: "constant"
      value: 1.0
    }
    bias_filler {
      type: "constant"
      value: 0.0
    }
  }
}
layer {
  name: "conv6a/relu"
  type: "ReLU"
  bottom: "conv6a/bn"
  top: "conv6a/bn"
}
layer {
  name: "conv7a"
  type: "Convolution"
  bottom: "conv6a/bn"
  top: "conv7a"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 0.0
  }
  convolution_param {
    num_output: 64
    bias_term: true
    pad: 2
    kernel_size: 3
    group: 4
    stride: 1
    weight_filler {
      type: "msra"
    }
    bias_filler {
      type: "constant"
      value: 0.0
    }
    dilation: 2
  }
}
layer {
  name: "conv7a/bn"
  type: "BatchNorm"
  bottom: "conv7a"
  top: "conv7a/bn"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 0.0
    decay_mult: 0.0
  }
  param {
    lr_mult: 0.0
    decay_mult: 0.0
  }
  batch_norm_param {
    moving_average_fraction: 0.990000009537
    eps: 9.99999974738e-05
    scale_filler {
      type: "constant"
      value: 1.0
    }
    bias_filler {
      type: "constant"
      value: 0.0
    }
  }
}
layer {
  name: "conv7a/relu"
  type: "ReLU"
  bottom: "conv7a/bn"
  top: "conv7a/bn"
}
layer {
  name: "res3_sum"
  type: "Eltwise"
  bottom: "res2_sum"
  bottom: "conv7a/bn"
  top: "res3_sum"
}
layer {
  name: "down3"
  type: "Convolution"
  bottom: "res2_sum"
  top: "down3"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 0.0
  }
  convolution_param {
    num_output: 128
    bias_term: true
    pad: 1
    kernel_size: 3
    group: 4
    stride: 2
    weight_filler {
      type: "msra"
    }
    bias_filler {
      type: "constant"
      value: 0.0
    }
    dilation: 1
  }
}
layer {
  name: "down3/bn"
  type: "BatchNorm"
  bottom: "down3"
  top: "down3/bn"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 0.0
    decay_mult: 0.0
  }
  param {
    lr_mult: 0.0
    decay_mult: 0.0
  }
  batch_norm_param {
    moving_average_fraction: 0.990000009537
    eps: 9.99999974738e-05
    scale_filler {
      type: "constant"
      value: 1.0
    }
    bias_filler {
      type: "constant"
      value: 0.0
    }
  }
}
layer {
  name: "down3/relu"
  type: "ReLU"
  bottom: "down3/bn"
  top: "down3/bn"
}
layer {
  name: "conv6b"
  type: "Convolution"
  bottom: "down3/bn"
  top: "conv6b"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 0.0
  }
  convolution_param {
    num_output: 128
    bias_term: true
    pad: 2
    kernel_size: 3
    group: 1
    stride: 1
    weight_filler {
      type: "msra"
    }
    bias_filler {
      type: "constant"
      value: 0.0
    }
    dilation: 2
  }
}
layer {
  name: "conv6b/bn"
  type: "BatchNorm"
  bottom: "conv6b"
  top: "conv6b/bn"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 0.0
    decay_mult: 0.0
  }
  param {
    lr_mult: 0.0
    decay_mult: 0.0
  }
  batch_norm_param {
    moving_average_fraction: 0.990000009537
    eps: 9.99999974738e-05
    scale_filler {
      type: "constant"
      value: 1.0
    }
    bias_filler {
      type: "constant"
      value: 0.0
    }
  }
}
layer {
  name: "conv6b/relu"
  type: "ReLU"
  bottom: "conv6b/bn"
  top: "conv6b/bn"
}
layer {
  name: "conv7b"
  type: "Convolution"
  bottom: "conv6b/bn"
  top: "conv7b"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 0.0
  }
  convolution_param {
    num_output: 128
    bias_term: true
    pad: 2
    kernel_size: 3
    group: 4
    stride: 1
    weight_filler {
      type: "msra"
    }
    bias_filler {
      type: "constant"
      value: 0.0
    }
    dilation: 2
  }
}
layer {
  name: "conv7b/bn"
  type: "BatchNorm"
  bottom: "conv7b"
  top: "conv7b/bn"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 0.0
    decay_mult: 0.0
  }
  param {
    lr_mult: 0.0
    decay_mult: 0.0
  }
  batch_norm_param {
    moving_average_fraction: 0.990000009537
    eps: 9.99999974738e-05
    scale_filler {
      type: "constant"
      value: 1.0
    }
    bias_filler {
      type: "constant"
      value: 0.0
    }
  }
}
layer {
  name: "conv7b/relu"
  type: "ReLU"
  bottom: "conv7b/bn"
  top: "conv7b/bn"
}
layer {
  name: "res4_sum"
  type: "Eltwise"
  bottom: "down3/bn"
  bottom: "conv7b/bn"
  top: "res4_sum"
}
layer {
  name: "conv8b"
  type: "Convolution"
  bottom: "res4_sum"
  top: "conv8b"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 0.0
  }
  convolution_param {
    num_output: 64
    bias_term: true
    pad: 1
    kernel_size: 3
    group: 1
    stride: 1
    weight_filler {
      type: "msra"
    }
    bias_filler {
      type: "constant"
      value: 0.0
    }
    dilation: 1
  }
}
layer {
  name: "conv8b/bn"
  type: "BatchNorm"
  bottom: "conv8b"
  top: "conv8b/bn"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 0.0
    decay_mult: 0.0
  }
  param {
    lr_mult: 0.0
    decay_mult: 0.0
  }
  batch_norm_param {
    moving_average_fraction: 0.990000009537
    eps: 9.99999974738e-05
    scale_filler {
      type: "constant"
      value: 1.0
    }
    bias_filler {
      type: "constant"
      value: 0.0
    }
  }
}
layer {
  name: "conv8b/relu"
  type: "ReLU"
  bottom: "conv8b/bn"
  top: "conv8b/bn"
}
layer {
  name: "deconv3"
  type: "Deconvolution"
  bottom: "conv8b/bn"
  top: "deconv3"
  param {
    lr_mult: 0.0
    decay_mult: 0.0
  }
  convolution_param {
    num_output: 64
    bias_term: true
    pad: 1
    kernel_size: 4
    group: 64
    stride: 2
    weight_filler {
      type: "bilinear"
    }
    bias_filler {
      type: "constant"
      value: 0.0
    }
  }
}
layer {
  name: "concat"
  type: "Concat"
  bottom: "res3_sum"
  bottom: "deconv3"
  top: "concat"
}
layer {
  name: "conv_final1a"
  type: "Convolution"
  bottom: "concat"
  top: "conv_final1a"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 0.0
  }
  convolution_param {
    num_output: 64
    bias_term: true
    pad: 1
    kernel_size: 3
    group: 1
    stride: 1
    weight_filler {
      type: "msra"
    }
    bias_filler {
      type: "constant"
      value: 0.0
    }
    dilation: 1
  }
}
layer {
  name: "conv_final1a/bn"
  type: "BatchNorm"
  bottom: "conv_final1a"
  top: "conv_final1a/bn"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 0.0
    decay_mult: 0.0
  }
  param {
    lr_mult: 0.0
    decay_mult: 0.0
  }
  batch_norm_param {
    moving_average_fraction: 0.990000009537
    eps: 9.99999974738e-05
    scale_filler {
      type: "constant"
      value: 1.0
    }
    bias_filler {
      type: "constant"
      value: 0.0
    }
  }
}
layer {
  name: "conv_final1a/relu"
  type: "ReLU"
  bottom: "conv_final1a/bn"
  top: "conv_final1a/bn"
}
layer {
  name: "conv_final1b"
  type: "Convolution"
  bottom: "conv_final1a/bn"
  top: "conv_final1b"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 0.0
  }
  convolution_param {
    num_output: 64
    bias_term: true
    pad: 1
    kernel_size: 3
    group: 1
    stride: 1
    weight_filler {
      type: "msra"
    }
    bias_filler {
      type: "constant"
      value: 0.0
    }
    dilation: 1
  }
}
layer {
  name: "conv_final1b/bn"
  type: "BatchNorm"
  bottom: "conv_final1b"
  top: "conv_final1b/bn"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 0.0
    decay_mult: 0.0
  }
  param {
    lr_mult: 0.0
    decay_mult: 0.0
  }
  batch_norm_param {
    moving_average_fraction: 0.990000009537
    eps: 9.99999974738e-05
    scale_filler {
      type: "constant"
      value: 1.0
    }
    bias_filler {
      type: "constant"
      value: 0.0
    }
  }
}
layer {
  name: "conv_final1b/relu"
  type: "ReLU"
  bottom: "conv_final1b/bn"
  top: "conv_final1b/bn"
}
layer {
  name: "conv_final1c"
  type: "Convolution"
  bottom: "conv_final1b/bn"
  top: "conv_final1c"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 2.0
    decay_mult: 0.0
  }
  convolution_param {
    num_output: 7
    bias_term: true
    pad: 1
    kernel_size: 3
    group: 1
    stride: 1
    weight_filler {
      type: "msra"
    }
    bias_filler {
      type: "constant"
      value: 0.0
    }
    dilation: 1
  }
}
layer {
  name: "conv_final1c/bn"
  type: "BatchNorm"
  bottom: "conv_final1c"
  top: "conv_final1c/bn"
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 1.0
    decay_mult: 1.0
  }
  param {
    lr_mult: 0.0
    decay_mult: 0.0
  }
  param {
    lr_mult: 0.0
    decay_mult: 0.0
  }
  batch_norm_param {
    moving_average_fraction: 0.990000009537
    eps: 9.99999974738e-05
    scale_filler {
      type: "constant"
      value: 1.0
    }
    bias_filler {
      type: "constant"
      value: 0.0
    }
  }
}
layer {
  name: "conv_final1c/relu"
  type: "ReLU"
  bottom: "conv_final1c/bn"
  top: "conv_final1c/bn"
}
layer {
  name: "deconv2"
  type: "Deconvolution"
  bottom: "conv_final1c/bn"
  top: "deconv2"
  param {
    lr_mult: 0.0
    decay_mult: 0.0
  }
  convolution_param {
    num_output: 7
    bias_term: true
    pad: 1
    kernel_size: 4
    group: 7
    stride: 2
    weight_filler {
      type: "bilinear"
    }
    bias_filler {
      type: "constant"
      value: 0.0
    }
  }
}
layer {
  name: "deconv1"
  type: "Deconvolution"
  bottom: "deconv2"
  top: "deconv1"
  param {
    lr_mult: 0.0
    decay_mult: 0.0
  }
  convolution_param {
    num_output: 7
    bias_term: true
    pad: 1
    kernel_size: 4
    group: 7
    stride: 2
    weight_filler {
      type: "bilinear"
    }
    bias_filler {
      type: "constant"
      value: 0.0
    }
  }
}
layer {
  name: "loss"
  type: "SoftmaxWithLoss"
  bottom: "deconv1"
  bottom: "label"
  top: "loss"
  propagate_down: true
  propagate_down: false
  loss_param {
    normalization: VALID
  }
}
layer {
  name: "accuracy/top1"
  type: "Accuracy"
  bottom: "deconv1"
  bottom: "label"
  top: "accuracy/top1"
  include {
    phase: TEST
  }
}

umeannthtome avatar Nov 03 '17 08:11 umeannthtome

If you notice the latest prototxt, you will notice that I am now using ImageLabelData and not ImageLabelListData. This is because ImageLabelData is significantly faster. But the downside is that you need to prepare LMDBs for it.

https://github.com/tidsp/caffe-jacinto-models/blob/caffe-0.16/trained/image_segmentation/cityscapes5_jsegnet21v2/initial/train.prototxt

Try training with ImageLabelData and let me know if there is a speed improvement. That that will help to identify where the problem is. caffe-0.16 has some changes in the way data is loaded.

mathmanu avatar Nov 03 '17 08:11 mathmanu

Hi Manu,

I have tried training with ImageLabelData and LMDB. The training speed does improve a little bit but is still slower than caffe-0.15.

When I monitor the GPU utilization, caffe-0.15 runs at >75% utilization while caffe-0.16 fluctuates between 35%-65%.

William

umeannthtome avatar Nov 03 '17 09:11 umeannthtome

I am not sure what is happening, but let us try to analyze.

Improved BatchNorm layer is a key difference between caffe-0.15 and caffe-0.16. In the definition of BatchNorm layers:

What happens if you remove all those param {} blocks and also make it in-place by have same blob for input and output? Not: you need to change the input to the subsequent layer also as we don't use /bn in the blob for the output of BatchNorm . Example:

layer { name: "conv1a/bn" type: "BatchNorm" bottom: "conv1a" top: "conv1a" batch_norm_param { moving_average_fraction: 0.990000009537 eps: 9.99999974738e-05 scale_filler { type: "constant" value: 1.0 } bias_filler { type: "constant" value: 0.0 } } } layer { name: "conv1a/relu" type: "ReLU" bottom: "conv1a" top: "conv1a" } layer { name: "down1" type: "Convolution" bottom: "conv1a" top: "down1"

Note that the above in-place BtachNorm configuration will not train properly in caffe-0.15 - only in caffe-0.16.

mathmanu avatar Nov 03 '17 09:11 mathmanu

I have made the BathNorm layers in-place and removed the params. The training speed is the same. My latest prototxt is in the link below: https://drive.google.com/file/d/1kZYGXeek1k-Z4Uf1V4rq79qk9DteaenV/view?usp=sharing

Could it be the switch from cudnn5.0 to cudnn6.0? My caffe-0.15 was compiled with cudnn5.0 while caffe-0.16 with cudnn6.0 since it is not compatible with lower version.

umeannthtome avatar Nov 03 '17 09:11 umeannthtome

Its possible. Let's try some changes: Try these changes one by one and check the speed.

(1) You have a few layers with dilations. Remove the dilations (2) You have a layer with large number of groups (Deconvolution layer). Reduce the number of groups to 1 (3) If the above doesn't give a clue, remove the deconvolution layer if possible. I have seen that the Deconvolution layer affects the speed under certain conditions - but I have not be able to pin point exactly under what condition it is. Note that CUDNN layer itself is missing for deconvolution.

mathmanu avatar Nov 03 '17 09:11 mathmanu

Hi Manu,

I have tried (1) and (2). Both result in faster training speed in caffe-0.15 than in caffe-0.16 as before.

And your hypothesis about (3) is right. Caffe-0.16 is now running faster than caffe-0.15 if I remove the deconv layers, for both ImageLabelListData and ImageLabelData input layer type. Having said that, I have not identified under what circumstance would the training speed remain this way while still having the deconv layers.

Many thanks for helping to identify the source of problem.

William

umeannthtome avatar Nov 06 '17 02:11 umeannthtome

Not sure if it is related to the fact that CUDNN implementation is missing for Deconvolution - but then that would impact both caffe-0.15 and caffe-0.16 equally. https://github.com/NVIDIA/caffe/issues/386

Btw, how were you able to remove Deconvolution completely? In an image to image problem we need to have Deconvolution layers. It is important to root cause this issue and get it fixed.

Can you file an issue in https://github.com/NVIDIA/caffe? You may have to share your prototxt model there so that they can try to root cause the issue.

mathmanu avatar Nov 06 '17 02:11 mathmanu

Thanks for the information. I shall try the cudnn_deconv_layer commit and see how it affects caffe-0.15 and caffe-0.16.

I removed deconvolution layers together with their corresponding downsampling/pooling layers.

umeannthtome avatar Nov 06 '17 03:11 umeannthtome