models icon indicating copy to clipboard operation
models copied to clipboard

Understanding pipeline.config

Open santhoshnumberone opened this issue 2 years ago • 3 comments

I am trying transfer learning on a pre trained model present in the TensorFlow 2 Detection Model Zoo using my own custom data.

I was looking into pipeline.config file on one of the models centernet_hg104_512x512_coco17_tpu-8

This is it's content

model {
  center_net {
    num_classes: 90
    feature_extractor {
      type: "hourglass_104"
      channel_means: 104.01361846923828
      channel_means: 114.03422546386719
      channel_means: 119.91659545898438
      channel_stds: 73.60276794433594
      channel_stds: 69.89082336425781
      channel_stds: 70.91507720947266
      bgr_ordering: true
    }
    image_resizer {
      keep_aspect_ratio_resizer {
        min_dimension: 512
        max_dimension: 512
        pad_to_max_dimension: true
      }
    }
    object_detection_task {
      task_loss_weight: 1.0
      offset_loss_weight: 1.0
      scale_loss_weight: 0.10000000149011612
      localization_loss {
        l1_localization_loss {
        }
      }
    }
    object_center_params {
      object_center_loss_weight: 1.0
      classification_loss {
        penalty_reduced_logistic_focal_loss {
          alpha: 2.0
          beta: 4.0
        }
      }
      min_box_overlap_iou: 0.699999988079071
      max_box_predictions: 100
    }
  }
}
train_config {
  batch_size: 128
  data_augmentation_options {
    random_horizontal_flip {
    }
  }
  data_augmentation_options {
    random_crop_image {
      min_aspect_ratio: 0.5
      max_aspect_ratio: 1.7000000476837158
      random_coef: 0.25
    }
  }
  data_augmentation_options {
    random_adjust_hue {
    }
  }
  data_augmentation_options {
    random_adjust_contrast {
    }
  }
  data_augmentation_options {
    random_adjust_saturation {
    }
  }
  data_augmentation_options {
    random_adjust_brightness {
    }
  }
  data_augmentation_options {
    random_absolute_pad_image {
      max_height_padding: 200
      max_width_padding: 200
      pad_color: 0.0
      pad_color: 0.0
      pad_color: 0.0
    }
  }
  optimizer {
    adam_optimizer {
      learning_rate {
        manual_step_learning_rate {
          initial_learning_rate: 0.0010000000474974513
          schedule {
            step: 90000
            learning_rate: 9.999999747378752e-05
          }
          schedule {
            step: 120000
            learning_rate: 9.999999747378752e-06
          }
        }
      }
      epsilon: 1.0000000116860974e-07
    }
    use_moving_average: false
  }
  fine_tune_checkpoint: "PATH_TO_BE_CONFIGURED"
  num_steps: 140000
  max_number_of_boxes: 100
  unpad_groundtruth_tensors: false
  fine_tune_checkpoint_type: "detection"
  fine_tune_checkpoint_version: V2
}
train_input_reader {
  label_map_path: "PATH_TO_BE_CONFIGURED"
  tf_record_input_reader {
    input_path: "PATH_TO_BE_CONFIGURED"
  }
}
eval_config {
  metrics_set: "coco_detection_metrics"
  use_moving_averages: false
  batch_size: 1
}
eval_input_reader {
  label_map_path: "PATH_TO_BE_CONFIGURED"
  shuffle: false
  num_epochs: 1
  tf_record_input_reader {
    input_path: "PATH_TO_BE_CONFIGURED"
  }
}

Inside model block

I want to train only one class so num_classes:1, I hope i am correct here.

Inside train_config block

  1. I want to know know what is the relationship between batch_size: 128 and num_steps: 140000 Is it like Total_Number_of_training_image = batch_size x num_steps So in my current case I have 1510 total training images does this imply if i keep the batch_size: 128 num_steps = Total_Number_of_training_image / batch_size num_steps = 1510/128 = 11.79(approx 12)

  2. How do I change the learning rate scheduler

optimizer {
    adam_optimizer {
      learning_rate {
        manual_step_learning_rate {
          initial_learning_rate: 0.0010000000474974513
          schedule {
            step: 90000
            learning_rate: 9.999999747378752e-05
          }
          schedule {
            step: 120000
            learning_rate: 9.999999747378752e-06
          }
        }
      }
      epsilon: 1.0000000116860974e-07
    }
use_moving_average: false
}

Module: tf.keras.optimizers.schedules

Or instead of this should I use Transfer learning and fine-tuning

How can I pick a TensorFlow 2 Detection Model Zoo model as my Base model

I tried looking around for it's explanation found this Tensorflow object detection config files documentation

Which lead me to this Configuring the Object Detection Training Pipeline

I want to experiment with different learning rate schedules Module: tf.keras.optimizers.schedules for optimisation and select different models from TensorFlow 2 Detection Model Zoo is there a guide for me to look into as to how to do this?

I am confused

santhoshnumberone avatar Apr 19 '22 19:04 santhoshnumberone

Hey, if you still need help I can try to help you. Effective batch size to use for training. For TPU (or sync SGD jobs), the batch size per core (or GPU) is going to be batch_size / number of cores (or batch_size / number of GPUs).

I'm not sure about number_of_steps, but I think it limits the maximum number of steps. During training, he will give you data in the terminal after every 100 steps (at least that's how it is for me). I've never spun to that number before, so I'm not sure.

Here is a very good article related to learning rate https://machinelearningmastery.com/learning-rate-for-deep-learning-neural-networks/

Do you need instructions on how to use TensorFlow 2 Detection Model Zoo or how to choose the best pretrained mode?

Edi2410 avatar Sep 21 '22 11:09 Edi2410

hi @Edi2410, I need help with how to choose the best trained checkpoint, I think by default tensorflow only takes the latest correct?

MoAbbasid avatar Jan 28 '24 14:01 MoAbbasid

@santhoshnumberone hi, I cant answer all your questions but one step is a pass through a single batch and one epoch is a pass through all the dataset so

num_steps = Total_Number_of_training_image / batch_size num_steps = 1510/128 = 11.79(approx 12)

your result is the number of steps it takes to complete an epoch or a pass through all your dataset

if you're training for 140000 steps and each epoch is 11.79 steps, you have trained for 11,874 epochs (14k over 11.79)

MoAbbasid avatar Jan 28 '24 16:01 MoAbbasid