tensorflow-deeplab-v3-plus
tensorflow-deeplab-v3-plus copied to clipboard
Training with Different number of classes
First of all, I would like to appreciate your good work!
I am a beginner and currently trying to use the PASCAL VOC trained DeepLab V3+ model that you have provided in the repository to train my own dataset with a different number of classes.
Please guide me through the changes required to make it happen.
https://stackoverflow.com/questions/47867748/transfer-learning-with-tf-estimator-estimator-framework will be helpful as this implementation uses the TF Estimator API
Hi @UR11EC017 , thank you for your interest in the repo.
Training with a different number of classes is very straightforward.
First, change _NUM_CLASSES
given in the codes to the number of classes of your dataset.
Then, modify the color map defined here appropriately.
Let me know if you encounter other problem.
Dear @rishizek, I have been trying to do the same, that is, use train DeepLab v3+ to train my own dataset with a different number of classes.
First of all, I have created my .record files using create_pascal_tf_record.py. After that, I have changed in the train.py _NUM_CLASSES and _HEIGHT and _WIDTH, to the particular values of my own problem (2 classes and 720x720 images). I also changed the color map. When running train.py using the new computed records, I encountered the following problem. It seems it is happening within a session, but I do not know which part I missed...
File "/home/user/Envs/deeplearning/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1409, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.InvalidArgumentError: Assign requires shapes of both tensors to match. lhs shape= [2] rhs shape= [21] [[Node: save/Assign_57 = Assign[T=DT_FLOAT, _class=["loc:@decoder/upsampling_logits/conv_1x1/biases"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](decoder/upsampling_logits/conv_1x1/biases/Momentum, save/RestoreV2/_1)]] [[Node: save/RestoreV2/_1842 = _Send[T=DT_FLOAT, client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="
Is there something else that I should not restore? From my understanding the class information is only specified in the decoding side right? Thanks in advance
Hi @esterglez , Thank you for your interest in the repo.
I'm not sure what is the exact problem. But it seems that the information for # of classes for PASCAL dataset (=21) is remained somewhere. Either your model architecture has the last layer with # of class 21 but saved checkpoint has 2 or vise versa. And this mismatch occasionally produces the error when loading checkpoint.
You may check if your architecture correctly holds last layer with # of class = 2 using TensorBoard.
If your model architecture is correctly hold # of classes = 2, then problem is because you are trying to load checkpoint with # of classes = 21. This sometimes happens when your model_dir
is not clean. Namely, you first trained model with PASCAL data, then checkpoint is generated with # of classes = 21, and after that you tried to train model with your dataset (# of classes = 2) and failed to load the checkpoint. You may need to clean model_dir
in that case.
I hope this can help you.
Dear @rishizek ,
"This sometimes happens when your model_dir is not clean. Namely, you first trained model with PASCAL data, then checkpoint is generated with # of classes = 21, and after that you tried to train model with your dataset (# of classes = 2) and failed to load the checkpoint. You may need to clean model_dir in that case."
This was exactly what was happening to me, so thank you very much for your help ;). Now I can continue.
@esterglez Are you using the pre-trained model? if so you have to stop the last layer to be initialized from the pre-trained model. you have two options for this: define another last layer with the same structure and initialize it manually, or use stop_restore_last_layer.
@Sam813 Could you please tell me details about how to change the code to realize it? Thank you very much.