MobileNet
MobileNet copied to clipboard
has anyone tried the mobileNet on KITTI dataset
When set up the configurations for train_mobilenetdet_on_kitti.sh, where is the model file to restore? use the offical released versions from here(https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet_v1.md) would be OK? For example:
- download MobileNet_v1_1.0_224
- use the path of MobileNet_v1_1.0_224 as the CHECK_POINT in train_mobilenetdet_on_kitti.sh
Has started training successfully, following the steps:
- download pretrained weight from the README.md
- use the path to the pretrained weight as the CHECK_POINT in train_mobilenetdet_on_kitti.sh
- modify the file "checkpoint" inside the pretrained weight folder: update the path to local pretrained weight file. For example: model_checkpoint_path: "MobileNet/data/mobilenetdet-model/model.ckpt-906808" all_model_checkpoint_paths: "MobileNet/data/mobilenetdet-model/model.ckpt-906808"
Thank you. This was very valuable advice to me!
i download pretrained weight from the README.md, but got loss nan. what is the problem do you know?
python2 will be fine
python2 will be fine
@lijunhong5457 do you mean, after using python2 instead of python3, the nan loss error vanished?
yes, when change python3 to python2 , there is no problem to train it. i modify the checkpoint like below: model_checkpoint_path: model.ckpt-906808" all_model_checkpoint_paths: "model.ckpt-906808" and I save the checkpoint to CHECK_POINT folder. it works well.
Thanks for letting me know that, @lijunhong5457 . But I am still facing InvalidArgumentError due to NaNs in the histograms. Have you ever faced that? Here is a snippet.
InvalidArgumentError (see above for traceback): Nan in summary histogram for: MobileNet/conv_ds_3/dw_batch_norm/moving_variance_1
[[Node: MobileNet/conv_ds_3/dw_batch_norm/moving_variance_1 = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](MobileNet/conv_ds_3/dw_batch_norm/moving_variance_1/tag, MobileNet/conv_ds_3/dw_batch_norm/moving_variance/read)]]
[[Node: fifo_queue_Dequeue/_1 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_56_fifo_queue_Dequeue", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]]
@Santara I think that your checkpoint was improperly modified, resulting in incorrect initial parameter loading. For checkpoint, don't add new lines or spaces manually, just modify them on the original basis. But the premise is that you modify the path so that the program can find the model.
@Santara when you get loss nan, you should clear train_dir to avoid that program load wrong parameter.
Thank you for all the help, @lijunhon - but even after doing everything you suggested, I am still getting NaN in summary histograms. Is it because I am running on a CPU? It should not be, right?