simclr icon indicating copy to clipboard operation
simclr copied to clipboard

Cannot finetune the full network from the TF2 SavedModel

Open wangkua1 opened this issue 3 years ago • 3 comments

Dear Ting,

Thanks for adding the TF2 SavedModels. Looks like the `trainable_variables_list' for the SavedModels are empty, and I cannot finetune the full network from these SavedModels. Is there a workaround, or will the trainable ckpts be released soon?

Thanks, Jackson

wangkua1 avatar Mar 13 '21 02:03 wangkua1

@saxenasaurabh, who created TF2 and converted SavedModels from TF1, probably knows more about this.

chentingpc avatar Mar 14 '21 02:03 chentingpc

[EDIT]: I found a closely related problem and a solution among the closed issue: #108. However, my problem is not solved yet. Therefore, I am moving the updated version of my question as a comment under that issue.

Hello,

I have a different problem related to fine-tuning models saved in TF2 SavedModel format. I want to use the self-supervised model as the backbone for another task. For that, I tried to use r50_1x_sk0 (which I downloaded from gs://simclr-checkpoints-tf2/simclrv2/pretrained). When the model is loaded with tf.saved_model.load (as illustrated in the notebooks), I get lines of warnings as follows:

WARNING:absl:Importing a function (__inference_sync_batch_normalization_42_layer_call_and_return_conditional_losses_34851) with ops with custom gradients. Will likely fail if a gradient is requested.
...

Although the model is loaded despite these warnings, when I called the model with flag trainable=True, I received the following error, which was indicated by the warning messages above.

LookupError: No gradient defined for operation 'resnet/block_group4/bottleneck_block_15/batch_norm_relu_52/sync_batch_normalization_52/moments/IdentityN_1' (op type: IdentityN)

Do you know a solution to this problem?

I also tried to re-instantiate the model and then load the weights only (using the checkpoint under variables folder within the saved_model folder and using tf.train.Checkpoint). then I get an error related to the mismatch between checkpointed variables and the variables of the instantiated model as follows:

AssertionError: Nothing except the root object matched a checkpointed value. Typically this means that the checkpoint does not match the Python program. The following objects have no matching checkpointed value: ...

How can I use checkpoints or the SavedModel to insert your models as a backbone to a network for a different task?

Thank you for your time.

free-bit avatar Sep 26 '21 07:09 free-bit

Guys, can someone please answer this?. I am facing the same issue. Please share some code snippet to finetune the entire n/w

rishabhm12 avatar Dec 20 '22 07:12 rishabhm12