simclr icon indicating copy to clipboard operation
simclr copied to clipboard

Finetune on custom dataset locally using TF2

Open vsk-phi opened this issue 2 years ago • 1 comments

hello, thanks for sharing code. i wanted to try the pretrained model for a fine tuning task on a custom dataset. After some trial and error, i was able to write as per tfds. Also, due to the version changes of tensorflow, i had a bit of trial and error on that as well and am able to make some progress on getting it to run. The command i use to finetune is from the README and is as python ./code/simclr/tf2/run.py --mode=train_then_eval --train_mode=finetune --fine_tune_after_block=4 --zero_init_logits_layer=True --global_bn=False --optimizer=momentum --learning_rate=0.1 --weight_decay=0.0 --train_epochs=10 --train_batch_size=64 --warmup_epochs=0 --dataset=tf_guidance --image_size=128 --eval_split=val --resnet_depth=50 --checkpoint=simclr_v1/1x/saved_model/ --model_dir=/tmp/simclr_test_ft --use_tpu=False The number of classes for my problem is 6. i have used the checkpoint from sumclr1 with 1x resnet as can be seen above. On running the above, the warning I get during the checkpoint loading is: I0121 17:28:25.435804 139671615018752 api.py:447] head_supervised/linear_layer/dense_3/bias:0 WARNING:tensorflow:Gradients do not exist for variables ['projection_head/nl_0/batch_norm_relu_53/batch_normalization_53 /gamma:0', 'projection_head/nl_0/batch_norm_relu_53/batch_normalization_53/beta:0', 'projection_head/nl_1/batch_norm_relu_54/batch_normalization_54/gamma:0', 'projection_head/nl_1/batch_norm_relu_54/batch_normalization_54/beta:0', 'projection_head/nl_2/batch_norm_relu_55/batch_normalization_55/gamma:0'] when minimizing the loss. If you're using model.compile(), did you forget to provide a lossargument? W0121 17:28:25.458693 139671615018752 utils.py:76] Gradients do not exist for variables ['projection_head/nl_0/batch_norm_relu_53/batch_normalization_53/gamma:0', 'projection_head/nl_0/batch_norm_relu_53/batch_normalization_53/beta:0', 'projection_head/nl_1/batch_norm_relu_54/batch_normalization_54/gamma:0', 'projection_head/nl_1/batch_norm_relu_54/batch_normalization_54/beta:0', 'projection_head/nl_2/batch_norm_relu_55/batch_normalization_55/gamma:0'] when minimizing the loss. If you're using model.compile(), did you forget to provide a lossargument? after a bit, tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found. (0) INVALID_ARGUMENT: logits and labels must be broadcastable: logits_size=[64,1000] labels_size=[64,6] [[node categorical_crossentropy/softmax_cross_entropy_with_logits (defined at /home/krishnan/pyenv/tf2/lib/python3.8/site-packages/keras/backend.py:5009) ]] [[Func/while/body/_1/image/write_summary/summary_cond/then/_1242/input/_1253/_30]] (1) INVALID_ARGUMENT: logits and labels must be broadcastable: logits_size=[64,1000] labels_size=[64,6] [[node categorical_crossentropy/softmax_cross_entropy_with_logits (defined at /home/krishnan/pyenv/tf2/lib/python3.8/site-packages/keras/backend.py:5009) ]] 0 successful operations. 0 derived errors ignored. [Op:__inference_train_multiple_steps_11689] I can guess that it is is due to the mismatch of the number of classes. As i can see, the number of classes is set in the code in run.py using the datasets. num_classes = builder.info.features['label'].num_classes on line 475 Unfortunately, i am unable to understand more or debug on this error in tensorflow. any suggestions welcome. Tensorflow version is 2.7.4. I have checked with 2.5.3 as well. Unfortunately, the code does not work in 2.4.1 citing some cuda version mismatch(though it works for later TF versions). Any help/hints welcome! Sorry if it seems a basic question. thank you.

vsk-phi avatar Jan 21 '23 12:01 vsk-phi

Can you share the directory structure I am running the same code but I am quite unclear with the path to give in --model_dir and --checkpoint

deepankarvarma avatar Jan 18 '24 09:01 deepankarvarma