MobileNet icon indicating copy to clipboard operation
MobileNet copied to clipboard

has anyone tried the mobileNet on KITTI dataset

Open totoroTree opened this issue 6 years ago • 10 comments

When set up the configurations for train_mobilenetdet_on_kitti.sh, where is the model file to restore? use the offical released versions from here(https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet_v1.md) would be OK? For example:

  1. download MobileNet_v1_1.0_224
  2. use the path of MobileNet_v1_1.0_224 as the CHECK_POINT in train_mobilenetdet_on_kitti.sh

totoroTree avatar Jun 21 '18 07:06 totoroTree

Has started training successfully, following the steps:

  1. download pretrained weight from the README.md
  2. use the path to the pretrained weight as the CHECK_POINT in train_mobilenetdet_on_kitti.sh
  3. modify the file "checkpoint" inside the pretrained weight folder: update the path to local pretrained weight file. For example: model_checkpoint_path: "MobileNet/data/mobilenetdet-model/model.ckpt-906808" all_model_checkpoint_paths: "MobileNet/data/mobilenetdet-model/model.ckpt-906808"

totoroTree avatar Jun 21 '18 09:06 totoroTree

Thank you. This was very valuable advice to me!

suhyung avatar Apr 22 '19 23:04 suhyung

i download pretrained weight from the README.md, but got loss nan. what is the problem do you know?

lijunhong5457 avatar Jun 03 '19 00:06 lijunhong5457

python2 will be fine

lijunhong5457 avatar Jun 03 '19 02:06 lijunhong5457

python2 will be fine

@lijunhong5457 do you mean, after using python2 instead of python3, the nan loss error vanished?

Santara avatar Jun 04 '19 08:06 Santara

yes, when change python3 to python2 , there is no problem to train it. i modify the checkpoint like below: model_checkpoint_path: model.ckpt-906808" all_model_checkpoint_paths: "model.ckpt-906808" and I save the checkpoint to CHECK_POINT folder. it works well.

lijunhon avatar Jun 04 '19 08:06 lijunhon

Thanks for letting me know that, @lijunhong5457 . But I am still facing InvalidArgumentError due to NaNs in the histograms. Have you ever faced that? Here is a snippet.

InvalidArgumentError (see above for traceback): Nan in summary histogram for: MobileNet/conv_ds_3/dw_batch_norm/moving_variance_1
         [[Node: MobileNet/conv_ds_3/dw_batch_norm/moving_variance_1 = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](MobileNet/conv_ds_3/dw_batch_norm/moving_variance_1/tag, MobileNet/conv_ds_3/dw_batch_norm/moving_variance/read)]]
         [[Node: fifo_queue_Dequeue/_1 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_56_fifo_queue_Dequeue", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]]

Santara avatar Jun 04 '19 08:06 Santara

@Santara I think that your checkpoint was improperly modified, resulting in incorrect initial parameter loading. For checkpoint, don't add new lines or spaces manually, just modify them on the original basis. But the premise is that you modify the path so that the program can find the model.

lijunhon avatar Jun 04 '19 09:06 lijunhon

@Santara when you get loss nan, you should clear train_dir to avoid that program load wrong parameter.

lijunhon avatar Jun 04 '19 09:06 lijunhon

Thank you for all the help, @lijunhon - but even after doing everything you suggested, I am still getting NaN in summary histograms. Is it because I am running on a CPU? It should not be, right?

Santara avatar Jun 05 '19 02:06 Santara