QAT model saving bug : KeyError: '__inference_depthwise_conv2d_layer_call_fn_126
Describe the bug Please download the scripts to reproduce from : https://drive.google.com/drive/folders/15cajAZ9sAZ2Uyix8sDVSYku6QCqDCec7?usp=sharing
Command to run : python sample_qat.py.
I have a simple model with input layer and a depthwise conv2d layer. I quantize this model by adding quantize_and_dequantize nodes at the input of depthwiseconv2d layer (commented in the code). When I save the model and load it back, I see the following
File "/home/dperi/Downloads/py3/lib/python3.6/site-packages/tensorflow/python/saved_model/load.py", line 544, in <lambda>
"function": lambda: self._recreate_function(proto.function),
File "/home/dperi/Downloads/py3/lib/python3.6/site-packages/tensorflow/python/saved_model/load.py", line 586, in _recreate_function
proto, self._concrete_functions), setattr
File "/home/dperi/Downloads/py3/lib/python3.6/site-packages/tensorflow/python/saved_model/function_deserialization.py", line 295, in recreate_function
concrete_function_objects.append(concrete_functions[concrete_function_name])
KeyError: '__inference_depthwise_conv2d_layer_call_and_return_conditional_losses_117'
System information
TensorFlow version (installed from source or binary): 2.5 (Tried with 2.6 as well)
TensorFlow Model Optimization version (installed from source or binary):
Saved model loading fails especially for Depthwise convolution. It works fine for regular conv.
Hi @Xhark , I also have the same bug when I want to quantize Mobilenet v2.
System information
TensorFlow version (installed from binary): 2.5.0 => TensorFlow Model Optimization version (installed from binary): 0.6.0
TensorFlow version (installed from binary): 2.5.1 => TensorFlow Model Optimization version (installed from binary): 0.7.0
TensorFlow version (installed from binary): 2.4.0 => TensorFlow Model Optimization version (installed from binary): 0.7.0
Python version: 3.8.12
Hi @Xhark and @peri044 ,
I use the following environment to solve my problem. System information TensorFlow version (installed from binary): tf-nightly-gpu 2.5.0.dev20201202 (https://www.cnpython.com/pypi/tf-nightly-gpu/download) TensorFlow Model Optimization version (installed from binary): 0.6.0 Python version: 3.8.12
Hi peri044@ and Jia-HongHenryLee@
I'm looking into it now, but there are a couple of workarounds. First, it seems to save correctly if you use
model.save('export_dir', save_format='h5')
I think this is caused by incorrect shape handling for the depthwise kernel quantization parameters, which results in functions not being traced/merged correctly.
Thanks for reporting this.
Thank you @daverim for addressing this. Can you let me know when this would be resolved or if there's an active PR for this ? I haven't tried h5 format, since I'm using saved model format to pass it through TF2ONNX (with custom utilities) for processing.
Hello @daverim, can you please suggest some pointers for me on how to fix this locally (using saved_model format)? Which files/functions to look at ? Thanks !!
Hey @peri044. If your ultimate goal is to convert the model into TFLite format you can pass ConcreteFunction around. from_concrete_functions of TFLiteConverter works just fine for me.
Hello @ChanZou My ultimate goal is to use the saved_model format (if it works) and pass it through TF2ONNX to convert it into ONNX graph. TF2ONNX accepts saved_model format for graphs currently.
Thank you @daverim for addressing this. Can you let me know when this would be resolved or if there's an active PR for this ? I haven't tried h5 format, since I'm using saved model format to pass it through TF2ONNX (with custom utilities) for processing.
Hello @daverim, any suggestions on how to resolve this would be appreciated. Thanks !!
Hi sorry for the delay.
I just tested your sample code and it seems to be resolved now. There are some warnings about un-traced functions.
Using: tf=2.8.0-dev20210930, tfmot=tensorflow_model_optimization=0.7.0
Please try and see if it works for you. Thanks, David
Thanks @daverim. That works now.
@daverim I encountered the same error log for SeparableConv2D using TF 2.8.0 (no error with DepthwiseConv2D in that TF version):
...
Traceback (most recent call last):
File "/home/PycharmProjects/tensorrt_qat/examples/mobilenet/run_qat_workflow.py", line 156, in <module>
main(verbose=True)
File "/home/PycharmProjects/tensorrt_qat/examples/mobilenet/run_qat_workflow.py", line 142, in main
tf.keras.models.save_model(q_model, os.path.join(qat_save_finetuned_weights, "saved_model"))
File "/home/PycharmProjects/tensorrt_qat/venv38_tf2.8_newPR/lib/python3.8/site-packages/keras/utils/traceback_utils.py", line 67, in error_handler
raise e.with_traceback(filtered_tb) from None
File "/home/PycharmProjects/tensorrt_qat/venv38_tf2.8_newPR/lib/python3.8/site-packages/tensorflow/python/saved_model/save.py", line 403, in map_resources
raise ValueError(
ValueError: Unable to save function b'__inference_block2_sepconv1_layer_call_fn_670910' because it captures graph tensor Tensor("xception/quant_block2_sepconv1/LastValueQuant_1/QuantizeAndDequantizeV4:0", shape=(3, 3, 64, 1), dtype=float32) from a parent function which cannot be converted to a constant with `tf.get_static_value`.
Do you have any idea what caused the error in DepthwiseConv2D and if the same fix would work for SeparableConv2D? Thank you!
The best way to avoid this issue is to disable the layer tracing when creating the SavedModel, but you'll have to manually define the serving_default function (this is the default name that is used in TF2ONNX).
@tf.function
def predict(*args, **kwargs):
return model(*args, **kwargs)
arg_spec, kwarg_spec = model.save_spec()
model.save(path, save_traces=False, signatures={
"serving_default": predict.get_concrete_function(*arg_spec, **kwarg_spec)
})
Hi @k-w-w thank you for your feedback! This specific issue (for DepthwiseConv) has been solved, as mentioned in a comment on Jan 26th above, but the same issue persists for SeparableConv here.
I tried your suggestion, but it did not solve my issue, since the problem is not with tf2onnx, but with saving the TF model. Do you have any additional suggestions please?
Thank you!
@gcunhase Are you getting the same error even with save_traces=False?
@k-w-w yes
@gcunhase can you paste the error trace?
@k-w-w :
...
Traceback (most recent call last):
File "/home/nvidia/PycharmProjects/nvbugs/internal_filed/tf_key_inference_bug/TF_bug_separableconv2d/sample.py", line 24, in <module>
model.save(model_save_path)
File "/home/nvidia/PycharmProjects/nvbugs/venv38_trt_regression/lib/python3.8/site-packages/keras/utils/traceback_utils.py", line 67, in error_handler
raise e.with_traceback(filtered_tb) from None
File "/home/nvidia/PycharmProjects/nvbugs/venv38_trt_regression/lib/python3.8/site-packages/tensorflow/python/saved_model/save.py", line 403, in map_resources
raise ValueError(
ValueError: Unable to save function b'__inference_separable_conv2d_layer_call_fn_961' because it captures graph tensor Tensor("model/quant_separable_conv2d/LastValueQuant_1/QuantizeAndDequantizeV4:0", shape=(3, 3, 3, 1), dtype=float32) from a parent function which cannot be converted to a constant with `tf.get_static_value`.
This bug also has the reproducible code, so we can move our discussion there if you agree.
This bug can be closed for DepthwiseConv2D.
For Conv2DTranspose and SeparableConv2D, please move the discussion here.
Thank you!