keras
keras copied to clipboard
Failure running a SavedModel exported from a tf.Module with a Keras model as an instance variable
I have been advised by the Tensorflow team to post this issue here. I will restate issue below.
Issue type
Bug
Have you reproduced the bug with TensorFlow Nightly?
No, because the sample code produces a core dump.
Source
binary
TensorFlow version
2.17.0
Custom code
No
OS platform and distribution
Linux Ubuntu 22.04.4 LTS
Mobile device
No response
Python version
3.10.12
Bazel version
No response
GCC/compiler version
No response
CUDA/cuDNN version
No response
GPU model and memory
No response
Current behavior?
Saving a tf.Module using tf.saved_model.save when that class contains a Keras model in an instance variable results in a FAILED_PRECONDITION when run using saved_model_cli or libtensorflow.
In Tensorflow v2.15.0, the behavior is as expected: the graph execution proceeds without any errors and the expected results are produced.
Standalone code to reproduce the issue
import tensorflow as tf
import numpy as np
SHAPE = (1, 5)
class TestModel(tf.Module):
def __init__(self):
super().__init__()
self.dense_layer = tf.keras.layers.Dense(10)
@tf.function(input_signature=[tf.TensorSpec(shape=SHAPE, dtype=tf.float32)])
def run(self, x):
return self.dense_layer(x)
module = TestModel()
sample_input = tf.random.normal(SHAPE, dtype=tf.float32)
module.run(sample_input)
np.save('sample_input.npy', sample_input.numpy())
tf.saved_model.save(module, "test_model")
# # To reproduce, run the following:
# python test.py && saved_model_cli run --dir test_model --tag_set serve --signature_def serving_default --inputs 'x=sample_input.npy'
Relevant log output
2024-08-01 15:25:35.204057: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-08-01 15:25:35.261217: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-08-01 15:25:35.278801: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-08-01 15:25:35.313892: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-08-01 15:25:37.270110: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2024-08-01 15:25:38.898105: I tensorflow/core/common_runtime/gpu/gpu_device.cc:2021] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 4281 MB memory: -> device: 0, name: Tesla V100-PCIE-16GB, pci bus id: 0000:3b:00.0, compute capability: 7.0
2024-08-01 15:25:38.898868: I tensorflow/core/common_runtime/gpu/gpu_device.cc:2021] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 944 MB memory: -> device: 1, name: Tesla V100-PCIE-16GB, pci bus id: 0000:d8:00.0, compute capability: 7.0
WARNING:tensorflow:From /home/iantolic-soban/tf_bug/.venv/lib/python3.10/site-packages/tensorflow/python/tools/saved_model_cli.py:716: load (from tensorflow.python.saved_model.loader_impl) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.saved_model.load` instead.
W0801 15:25:38.903731 139977869341120 deprecation.py:50] From /home/iantolic-soban/tf_bug/.venv/lib/python3.10/site-packages/tensorflow/python/tools/saved_model_cli.py:716: load (from tensorflow.python.saved_model.loader_impl) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.saved_model.load` instead.
INFO:tensorflow:Restoring parameters from test_model/variables/variables
I0801 15:25:38.936800 139977869341120 saver.py:1417] Restoring parameters from test_model/variables/variables
2024-08-01 15:25:38.941206: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:388] MLIR V1 optimization pass is not enabled
2024-08-01 15:25:39.144248: I tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: FAILED_PRECONDITION: Could not find variable dense/bias. This could mean that the variable has been deleted. In TF1, it can also mean the variable is uninitialized. Debug info: container=localhost, status error message=Resource localhost/dense/bias/N10tensorflow3VarE does not exist.
[[{{function_node __inference_run_106}}{{node dense_1/Add/ReadVariableOp}}]]
2024-08-01 15:25:39.144340: I tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: FAILED_PRECONDITION: Could not find variable dense/bias. This could mean that the variable has been deleted. In TF1, it can also mean the variable is uninitialized. Debug info: container=localhost, status error message=Resource localhost/dense/bias/N10tensorflow3VarE does not exist.
[[{{function_node __inference_run_106}}{{node dense_1/Add/ReadVariableOp}}]]
[[StatefulPartitionedCall/_21]]
2024-08-01 15:25:39.144423: I tensorflow/core/framework/local_rendezvous.cc:423] Local rendezvous recv item cancelled. Key hash: 12615348601576968325
Traceback (most recent call last):
File "/home/iantolic-soban/tf_bug/.venv/lib/python3.10/site-packages/tensorflow/python/client/session.py", line 1401, in _do_call
return fn(*args)
File "/home/iantolic-soban/tf_bug/.venv/lib/python3.10/site-packages/tensorflow/python/client/session.py", line 1384, in _run_fn
return self._call_tf_sessionrun(options, feed_dict, fetch_list,
File "/home/iantolic-soban/tf_bug/.venv/lib/python3.10/site-packages/tensorflow/python/client/session.py", line 1477, in _call_tf_sessionrun
return tf_session.TF_SessionRun_wrapper(self._session, options, feed_dict,
tensorflow.python.framework.errors_impl.FailedPreconditionError: 2 root error(s) found.
(0) FAILED_PRECONDITION: Could not find variable dense/bias. This could mean that the variable has been deleted. In TF1, it can also mean the variable is uninitialized. Debug info: container=localhost, status error message=Resource localhost/dense/bias/N10tensorflow3VarE does not exist.
[[{{function_node __inference_run_106}}{{node dense_1/Add/ReadVariableOp}}]]
[[StatefulPartitionedCall/_21]]
(1) FAILED_PRECONDITION: Could not find variable dense/bias. This could mean that the variable has been deleted. In TF1, it can also mean the variable is uninitialized. Debug info: container=localhost, status error message=Resource localhost/dense/bias/N10tensorflow3VarE does not exist.
[[{{function_node __inference_run_106}}{{node dense_1/Add/ReadVariableOp}}]]
0 successful operations.
0 derived errors ignored.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/iantolic-soban/tf_bug/.venv/bin/saved_model_cli", line 8, in <module>
sys.exit(main())
File "/home/iantolic-soban/tf_bug/.venv/lib/python3.10/site-packages/tensorflow/python/tools/saved_model_cli.py", line 1340, in main
app.run(smcli_main)
File "/home/iantolic-soban/tf_bug/.venv/lib/python3.10/site-packages/absl/app.py", line 308, in run
_run_main(main, args)
File "/home/iantolic-soban/tf_bug/.venv/lib/python3.10/site-packages/absl/app.py", line 254, in _run_main
sys.exit(main(argv))
File "/home/iantolic-soban/tf_bug/.venv/lib/python3.10/site-packages/tensorflow/python/tools/saved_model_cli.py", line 1338, in smcli_main
args.func()
File "/home/iantolic-soban/tf_bug/.venv/lib/python3.10/site-packages/tensorflow/python/tools/saved_model_cli.py", line 1036, in run
run_saved_model_with_feed_dict(
File "/home/iantolic-soban/tf_bug/.venv/lib/python3.10/site-packages/tensorflow/python/tools/saved_model_cli.py", line 721, in run_saved_model_with_feed_dict
outputs = sess.run(output_tensor_names_sorted, feed_dict=inputs_feed_dict)
File "/home/iantolic-soban/tf_bug/.venv/lib/python3.10/site-packages/tensorflow/python/client/session.py", line 971, in run
result = self._run(None, fetches, feed_dict, options_ptr,
File "/home/iantolic-soban/tf_bug/.venv/lib/python3.10/site-packages/tensorflow/python/client/session.py", line 1214, in _run
results = self._do_run(handle, final_targets, final_fetches,
File "/home/iantolic-soban/tf_bug/.venv/lib/python3.10/site-packages/tensorflow/python/client/session.py", line 1394, in _do_run
return self._do_call(_run_fn, feeds, fetches, targets, options,
File "/home/iantolic-soban/tf_bug/.venv/lib/python3.10/site-packages/tensorflow/python/client/session.py", line 1420, in _do_call
raise type(e)(node_def, op, message) # pylint: disable=no-value-for-parameter
tensorflow.python.framework.errors_impl.FailedPreconditionError: Graph execution error:
2 root error(s) found.
(0) FAILED_PRECONDITION: Could not find variable dense/bias. This could mean that the variable has been deleted. In TF1, it can also mean the variable is uninitialized. Debug info: container=localhost, status error message=Resource localhost/dense/bias/N10tensorflow3VarE does not exist.
[[{{node dense_1/Add/ReadVariableOp}}]]
[[StatefulPartitionedCall/_21]]
(1) FAILED_PRECONDITION: Could not find variable dense/bias. This could mean that the variable has been deleted. In TF1, it can also mean the variable is uninitialized. Debug info: container=localhost, status error message=Resource localhost/dense/bias/N10tensorflow3VarE does not exist.
[[{{node dense_1/Add/ReadVariableOp}}]]
0 successful operations.
0 derived errors ignored.
@ivansoban ,
This is a consequence of Tensorflow moving to Keras 3. Tensorflow 2.15 and lower use Keras 2 by default. Tensorflow 2.16 and higher use Keras 3 by default. Some background here.
Keras 3 is multi-backend, so the Model class and Layer class no longer inherit from tf.Module to be compatible with JAX and Torch. One consequenceis that variables are no longer automatically tracked recursively through models and layers.
To solve this, you have several options:
- revert to tf_keras (Keras 2) by using
TF_USE_LEGACY_KERAS=1on the command line or this in Python before importing tensorflow:
import os
os.environ["TF_USE_LEGACY_KERAS"] = "1"
- use the Keras export API: either
model.export()orExportArchive. - manually track variables from the layers in your module, which is basically reimplementing some of the logic in
ExportArchive, but I do not recommend this approach.
Thank you @hertschuh for the information and solutions.
If I am understanding the implications of the move from Keras 2 to 3 correctly, the saving of a Keras 3 layer/model as an instance variable within a tf.Module is no longer really feasible if we want to export the tf.Module as a SavedModel. That is unless we want to pursue option 3 which does seem like a poor choice.
@ivansoban ,
Correct. Any reason why you cannot use option 2?
This issue is stale because it has been open for 14 days with no activity. It will be closed if no further activity occurs. Thank you.
This issue was closed because it has been inactive for 28 days. Please reopen if you'd like to work on this further.