models icon indicating copy to clipboard operation
models copied to clipboard

Default learning rate in TF2 SSD MobileNet V2 config file is way too high. Is it a typo?

Open EdjeElectronics opened this issue 3 years ago • 4 comments

The learning rate set in the TF2 SSD MobileNet V2 config file is 10x higher than that of the other SSD MobileNet models. This causes loss during training to get extremely high. Is it a typo?

The default ssd_mobilenet_v2_320x320_coco17_tpu-8.config configuration has this for the learning rate:

  optimizer {
    momentum_optimizer: {
      learning_rate: {
        cosine_decay_learning_rate {
          learning_rate_base: .8
          total_steps: 50000
          warmup_learning_rate: 0.13333
          warmup_steps: 2000
        }
      }
      momentum_optimizer_value: 0.9

Meanwhile, the FPNLite version ( ssd_mobilenet_v2_fpnlite_320x320_coco17_tpu-8.config ) has this:

  optimizer {
    momentum_optimizer: {
      learning_rate: {
        cosine_decay_learning_rate {
          learning_rate_base: .08
          total_steps: 50000
          warmup_learning_rate: .026666
          warmup_steps: 1000
        }
      }
      momentum_optimizer_value: 0.9

When I train with the default values in the ssd_mobilenet_v2_320x320_coco17_tpu-8.config file, the huge learning rate throws training way off. When I change the values from .8 to .08 and .13333 to .013333, training works much better. I think whoever wrote the config file missed a decimal point.

Loss graph BEFORE changing learning rate values (the loss is way higher) image

Loss graph AFTER changing learning rate values image

EdjeElectronics avatar Feb 19 '22 16:02 EdjeElectronics

This is a wonderful question.. I wonder why no one has commented on this ? Any suggestions

Annieliaquat avatar Nov 23 '22 18:11 Annieliaquat

The learning rate set in the TF2 SSD MobileNet V2 config file is 10x higher than that of the other SSD MobileNet models. This causes loss during training to get extremely high. Is it a typo?

The default ssd_mobilenet_v2_320x320_coco17_tpu-8.config configuration has this for the learning rate:

  optimizer {
    momentum_optimizer: {
      learning_rate: {
        cosine_decay_learning_rate {
          learning_rate_base: .8
          total_steps: 50000
          warmup_learning_rate: 0.13333
          warmup_steps: 2000
        }
      }
      momentum_optimizer_value: 0.9

Meanwhile, the FPNLite version ( ssd_mobilenet_v2_fpnlite_320x320_coco17_tpu-8.config ) has this:

  optimizer {
    momentum_optimizer: {
      learning_rate: {
        cosine_decay_learning_rate {
          learning_rate_base: .08
          total_steps: 50000
          warmup_learning_rate: .026666
          warmup_steps: 1000
        }
      }
      momentum_optimizer_value: 0.9

When I train with the default values in the ssd_mobilenet_v2_320x320_coco17_tpu-8.config file, the huge learning rate throws training way off. When I change the values from .8 to .08 and .13333 to .013333, training works much better. I think whoever wrote the config file missed a decimal point.

Loss graph BEFORE changing learning rate values (the loss is way higher) image

Loss graph AFTER changing learning rate values image

Did you get an answer for your this query?

Annieliaquat avatar Jan 09 '23 14:01 Annieliaquat

Hi @Annieliaquat , yes I did! They tried to implement a change to fix it, but it got rejected. https://github.com/tensorflow/models/pull/10531

The high learning rate is intended for training with TPUs. You can change it manually back to a lower learning rate if you're just training with a CPU or GPU.

EdjeElectronics avatar Jan 09 '23 15:01 EdjeElectronics

Hi @Annieliaquat , yes I did! They tried to implement a change to fix it, but it got rejected. #10531

The high learning rate is intended for training with TPUs. You can change it manually back to a lower learning rate if you're just training with a CPU or GPU.

okay thanks alot

Annieliaquat avatar Jan 09 '23 18:01 Annieliaquat