doctr
doctr copied to clipboard
PERMANENT DRAFT: TF grouped convolutions check
This PR:
- only a test PR at the moment
- fix for TF mobilenet grouped convolutions issue
tensorflow.python.framework.errors_impl.InvalidArgumentError: {{function_node __wrapped__Conv2DBackpropInput_device_/job:localhost/replica:0/task:0/device:CPU:0}} Gradients for grouped convolutions are not supported on CPU. Please file a feature request if you run into this issue. Computed input depth 576 doesn't match filter input depth 1 [Op:Conv2DBackpropInput]
The big disadvantage there would be, that we would need to retrain all models which uses mobilenet as backbone and the classification models itself
The export for TF mobilenet from #1182 still works with the changes from this PR
python3 /home/felix/Desktop/doctr/references/classification/train_tensorflow.py mobilenet_v3_small --pretrained
WARNING:tensorflow:Detecting that an object or model or tf.train.Checkpoint is being deleted with unrestored values. See the following logs for the specific values in question. To silence these warnings, use `status.expect_partial()`. See https://www.tensorflow.org/api_docs/python/tf/train/Checkpoint#restorefor details about the status object returned by the restore function.
WARNING:tensorflow:Detecting that an object or model or tf.train.Checkpoint is being deleted with unrestored values. See the following logs for the specific values in question. To silence these warnings, use `status.expect_partial()`. See https://www.tensorflow.org/api_docs/python/tf/train/Checkpoint#restorefor details about the status object returned by the restore function.
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).layer_with_weights-1.block.layer_with_weights-0.kernel
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).layer_with_weights-1.block.layer_with_weights-0.kernel
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).layer_with_weights-2.block.layer_with_weights-2.kernel
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).layer_with_weights-2.block.layer_with_weights-2.kernel
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).layer_with_weights-3.block.layer_with_weights-2.kernel
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).layer_with_weights-3.block.layer_with_weights-2.kernel
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).layer_with_weights-4.block.layer_with_weights-2.kernel
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).layer_with_weights-4.block.layer_with_weights-2.kernel
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).layer_with_weights-5.block.layer_with_weights-2.kernel
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).layer_with_weights-5.block.layer_with_weights-2.kernel
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).layer_with_weights-6.block.layer_with_weights-2.kernel
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).layer_with_weights-6.block.layer_with_weights-2.kernel
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).layer_with_weights-7.block.layer_with_weights-2.kernel
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).layer_with_weights-7.block.layer_with_weights-2.kernel
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).layer_with_weights-8.block.layer_with_weights-2.kernel
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).layer_with_weights-8.block.layer_with_weights-2.kernel
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).layer_with_weights-9.block.layer_with_weights-2.kernel
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).layer_with_weights-9.block.layer_with_weights-2.kernel
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).layer_with_weights-10.block.layer_with_weights-2.kernel
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).layer_with_weights-10.block.layer_with_weights-2.kernel
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).layer_with_weights-11.block.layer_with_weights-2.kernel
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).layer_with_weights-11.block.layer_with_weights-2.kernel
@frgfm What do you think makes it sense in that case of retraining all the stuff to fix this issue ?
And the more related question @olivmindee @charlesmindee Could you retrain the models ? Especially the rotation classification model and the detection/recogition ones depends on your datasets (computation power would not be a problem on my side but the data is π )
ok SeperableConv2D and DepthwiseConv2D does not support different row/col stride values currently which would break the rectangular pooling implementations π©
An integer or tuple/list of 2 integers, specifying the strides of the convolution along the height and width. Can be a single integer to specify the same value for all spatial dimensions. Current implementation only supports equal length strides in the row and column dimensions. Specifying any stride value != 1 is incompatible with specifying any dilation_rate value != 1.
Maybe we can keep this as Draft and check if the next TF release brings a fix
Thanks for the suggestion Felix! Yes as you mentioned, I think this is a risky move :sweat_smile: I used to ping the TF team regularly to fix this, but it hasn't helped so far. It's a shame, you can always expect differences of support between frameworks but grouped convolutions ... the TF team seems to have too much stuff to handle at the moment so it's complicated to get a time estimate :/
Thanks for the suggestion Felix! Yes as you mentioned, I think this is a risky move sweat_smile I used to ping the TF team regularly to fix this, but it hasn't helped so far. It's a shame, you can always expect differences of support between frameworks but grouped convolutions ... the TF team seems to have too much stuff to handle at the moment so it's complicated to get a time estimate :/
Yeah lets keep this Draft and i will try to check after each TF release if it is maybe fixed π
Codecov Report
Merging #1183 (62976f1) into main (3deac68) will decrease coverage by
0.02%
. The diff coverage isn/a
.
@@ Coverage Diff @@
## main #1183 +/- ##
==========================================
- Coverage 95.78% 95.76% -0.02%
==========================================
Files 154 154
Lines 6903 6903
==========================================
- Hits 6612 6611 -1
- Misses 291 292 +1
Flag | Coverage Ξ | |
---|---|---|
unittests | 95.76% <ΓΈ> (-0.02%) |
:arrow_down: |
Flags with carried forward coverage won't be shown. Click here to find out more.