simclr icon indicating copy to clipboard operation
simclr copied to clipboard

Issue in Finetune V2 simclr on Cifar10

Open AlexMaOLS opened this issue 3 years ago • 2 comments

I have pretrained the V2 simclr on Cifar10 as below: ! python run.py --train_mode=pretrain --train_batch_size=256 --train_epochs=400 \ --learning_rate=0.2 --learning_rate_scaling=sqrt --proj_out_dim=64 --num_proj_layers=2 \ --weight_decay=1e-4 --temperature=0.2 \ --dataset=cifar10 --data_dir=/tmp/dataset \ --image_size=32 --eval_split=test --resnet_depth=18 --use_blur=False --color_jitter_strength=0.5 \ --model_dir=../../save_result/cifar10_checkpoint \ --use_tpu=False --cache_dataset=True

It runs well, then I run the fine-tune part: ! python run.py --mode=train_then_eval --train_mode=finetune --fine_tune_after_block=4 \ --zero_init_logits_layer=True \ --global_bn=False --optimizer=momentum --learning_rate=0.1 --weight_decay=0.0 \ --train_epochs=100 --train_batch_size=512 --warmup_epochs=0 --dataset=cifar10 \ --data_dir=/tmp/dataset --image_size=32 --eval_split=test --resnet_depth=18 \ --checkpoint=../../save_result/cifar10_checkpoint/ckpt-78195 --model_dir=../../save_result/cifar10_finetune_model/ --use_tpu=False

It shows an error as below:

I0101 03:37:59.526749 139677453731712 run.py:323] Restoring from given checkpoint: ../../save_result/cifar10_checkpoint/ckpt-78195 I0101 03:37:59.543215 139677453731712 run.py:333] Initializing output layer parameters [] to zero 2022-01-01 03:37:59.554517: W tensorflow/core/grappler/optimizers/data/slack.cc:103] Could not find a final prefetch in the input pipeline to which to introduce slack.

WARNING:tensorflow:From /usr/local/lib/python3.7/dist-packages/tensorflow/python/util/deprecation.py:620: calling map_fn_v2 (from tensorflow.python.ops.map_fn) with dtype is deprecated and will be removed in a future version. Instructions for updating: Use fn_output_signature instead W0101 03:38:00.162250 139672445486848 deprecation.py:551] From /usr/local/lib/python3.7/dist-packages/tensorflow/python/util/deprecation.py:620: calling map_fn_v2 (from tensorflow.python.ops.map_fn) with dtype is deprecated and will be removed in a future version. Instructions for updating: Use fn_output_signature instead I0101 03:38:03.507634 139672445486848 api.py:447] Trainable variables: I0101 03:38:04.769429 139672445486848 api.py:447] projection_head/nl_0/batch_norm_relu_21/batch_normalization_21/gamma:0 I0101 03:38:04.791169 139672445486848 api.py:447] projection_head/nl_0/batch_norm_relu_21/batch_normalization_21/beta:0 I0101 03:38:04.813601 139672445486848 api.py:447] projection_head/nl_0/dense/kernel:0 I0101 03:38:04.837257 139672445486848 api.py:447] projection_head/nl_1/batch_norm_relu_22/batch_normalization_22/gamma:0 I0101 03:38:04.858916 139672445486848 api.py:447] projection_head/nl_1/batch_norm_relu_22/batch_normalization_22/beta:0 I0101 03:38:04.880863 139672445486848 api.py:447] projection_head/nl_1/dense_1/kernel:0 I0101 03:38:04.902315 139672445486848 api.py:447] projection_head/nl_2/batch_norm_relu_23/batch_normalization_23/gamma:0 I0101 03:38:04.924816 139672445486848 api.py:447] projection_head/nl_2/dense_2/kernel:0 I0101 03:38:04.954355 139672445486848 api.py:447] head_supervised/linear_layer/dense_3/kernel:0 I0101 03:38:04.980358 139672445486848 api.py:447] head_supervised/linear_layer/dense_3/bias:0 WARNING:tensorflow:Gradients do not exist for variables ['projection_head/nl_0/batch_norm_relu_21/batch_normalization_21/gamma:0', 'projection_head/nl_0/batch_norm_relu_21/batch_normalization_21/beta:0', 'projection_head/nl_1/batch_norm_relu_22/batch_normalization_22/gamma:0', 'projection_head/nl_1/batch_norm_relu_22/batch_normalization_22/beta:0', 'projection_head/nl_2/batch_norm_relu_23/batch_normalization_23/gamma:0'] when minimizing the loss. If you're using model.compile(), did you forget to provide a lossargument? W0101 03:38:05.732908 139672445486848 utils.py:80] Gradients do not exist for variables ['projection_head/nl_0/batch_norm_relu_21/batch_normalization_21/gamma:0', 'projection_head/nl_0/batch_norm_relu_21/batch_normalization_21/beta:0', 'projection_head/nl_1/batch_norm_relu_22/batch_normalization_22/gamma:0', 'projection_head/nl_1/batch_norm_relu_22/batch_normalization_22/beta:0', 'projection_head/nl_2/batch_norm_relu_23/batch_normalization_23/gamma:0'] when minimizing the loss. If you're using model.compile(), did you forget to provide a lossargument?

Traceback (most recent call last): File "run.py", line 677, in app.run(main) File "/usr/local/lib/python3.7/dist-packages/absl/app.py", line 303, in run _run_main(main, args) File "/usr/local/lib/python3.7/dist-packages/absl/app.py", line 251, in _run_main sys.exit(main(argv)) File "run.py", line 653, in main train_multiple_steps(iterator) File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/util/traceback_utils.py", line 153, in error_handler raise e.with_traceback(filtered_tb) from None File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/execute.py", line 59, in quick_execute inputs, attrs, num_outputs) tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found. (0) INVALID_ARGUMENT: var and accum do not have the same shape[512,64] [512,512] [[node SGD/SGD/update_1/update_0/ResourceApplyKerasMomentum (defined at /usr/local/lib/python3.7/dist-packages/keras/optimizer_v2/gradient_descent.py:140) ]] [[while/body/_1/mod/y/_18]] (1) INVALID_ARGUMENT: var and accum do not have the same shape[512,64] [512,512] [[node SGD/SGD/update_1/update_0/ResourceApplyKerasMomentum (defined at /usr/local/lib/python3.7/dist-packages/keras/optimizer_v2/gradient_descent.py:140) ]] 0 successful operations. 0 derived errors ignored. [Op:__inference_train_multiple_steps_8990]

Errors may have originated from an input operation. Input Source operations connected to node SGD/SGD/update_1/update_0/ResourceApplyKerasMomentum: In[0] projection_head/nl_1/dense_1/MatMul/ReadVariableOp/resource (defined at /usr/local/lib/python3.7/dist-packages/keras/layers/core/dense.py:199) In[1] SGD/SGD/update_1/update_0/ResourceApplyKerasMomentum/accum: In[2] SGD/Identity (defined at /usr/local/lib/python3.7/dist-packages/keras/optimizer_v2/optimizer_v2.py:955) In[3] gradient_tape/mul_1 (defined at run.py:623) In[4] SGD/Identity_1 (defined at /usr/local/lib/python3.7/dist-packages/keras/optimizer_v2/gradient_descent.py:124) In[5] L2Loss_1/ReadVariableOp (defined at /content/drive/My Drive/simCLR_github/simclr/tf2/model.py:63) In[6] projection_head/nl_1/dense_1/MatMul/ReadVariableOp:

Operation defined at: (most recent call last)

File "run.py", line 677, in app.run(main)

File "/usr/local/lib/python3.7/dist-packages/absl/app.py", line 303, in run _run_main(main, args)

File "/usr/local/lib/python3.7/dist-packages/absl/app.py", line 251, in _run_main sys.exit(main(argv))

File "run.py", line 653, in main train_multiple_steps(iterator)

File "run.py", line 632, in train_multiple_steps for _ in tf.range(steps_per_loop):

File "run.py", line 644, in train_multiple_steps strategy.run(single_step, (features, labels))

File "/usr/local/lib/python3.7/dist-packages/keras/optimizer_v2/optimizer_v2.py", line 722, in _distributed_apply var, apply_grad_to_update_var, args=(grad,), group=False)

File "/usr/local/lib/python3.7/dist-packages/keras/optimizer_v2/optimizer_v2.py", line 704, in apply_grad_to_update_var update_op = self._resource_apply_dense(grad, var, **apply_kwargs)

File "/usr/local/lib/python3.7/dist-packages/keras/optimizer_v2/gradient_descent.py", line 140, in _resource_apply_dense use_nesterov=self.nesterov)

Function call stack: train_multiple_steps -> while_body_5082 -> train_multiple_steps -> while_body_5082

AlexMaOLS avatar Jan 01 '22 03:01 AlexMaOLS

I think the issue is either related to the gradient of projection head or the dimension mismatch: [512,64] [512,512]. But I really do not know what causes this since I directly use my pretrained checkpoints. Thank you so much for the help!

AlexMaOLS avatar Jan 01 '22 03:01 AlexMaOLS

From the msg I'm also not sure what went wrong here (it seems the 64 dim is the projection head dim, and 512 is the resnet output dim), but it may be worth trying to set --proj_out_dim=64 --num_proj_layers=2 --ft_proj_selector=0.

chentingpc avatar Jan 04 '22 00:01 chentingpc