PReLU Op Builtin Kernel gives NaN output
Issue type
Bug
Have you reproduced the bug with TensorFlow Nightly?
Yes
Source
source
TensorFlow version
tf 2.14
Custom code
Yes
OS platform and distribution
Linux Ubuntu 20.04.6 LTS
Mobile device
No response
Python version
3.11
Bazel version
No response
GCC/compiler version
No response
CUDA/cuDNN version
No response
GPU model and memory
No response
Current behavior?
Some output values in a PReLU output tensor are NaN when using TFLite Interpreter with BUILTIN kernels. No NaNs are seen when using BUILTIN_REF (reference) kernels, so this appears to only be an issue with builtin. I would expect to see similar values when using both builtin and reference kernels; and not see any NaNs.
Standalone code to reproduce the issue
import tensorflow as tf
import numpy as np
def make_prelu_tflite():
model = tf.keras.Sequential(
[
tf.keras.Input((540, 960, 16), dtype=tf.float32),
tf.keras.layers.PReLU(shared_axes=(1,2,3))
]
)
# Imitate effect of training prelu weight
a = np.ndarray(shape=(1,1,1,1))
a[0][0][0][0] = 0.00040957872988656163
model.layers[0].set_weights(a)
# Convert and save the model
converter = tf.lite.TFLiteConverter.from_keras_model(model)
tflite_model = converter.convert()
with open(TFLITE_FILE, 'wb') as f:
f.write(tflite_model)
def run_tflite_inference(tflite_path, input_npy_path, out_npy_path):
# Using AUTO/BUILTIN resolver
interpreter = tf.lite.Interpreter(model_path=tflite_path)
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
interpreter.allocate_tensors()
input_npy = np.load(input_npy_path)
interpreter.set_tensor(input_details[0]['index'], input_npy)
interpreter.invoke()
output = interpreter.get_tensor(output_details[0]['index'])
print(f"Output has nan: {np.any(np.isnan(output))}")
print(f"Writing output to {out_npy_path}")
np.save(f"{out_npy_path}", output)
if __name__ == "__main__":
TFLITE_FILE = "simple_prelu.tflite"
NPY_INPUT_FILE = "faulty_input.npy"
NPY_OUTPUT_FILE = "faulty_output.npy"
make_prelu_tflite()
run_tflite_inference(TFLITE_FILE, NPY_INPUT_FILE, NPY_OUTPUT_FILE)
Relevant log output
2024-02-02 17:50:30.399478: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:378] Ignored output_format.
2024-02-02 17:50:30.399540: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:381] Ignored drop_control_dependency.
2024-02-02 17:50:30.400359: I tensorflow/cc/saved_model/reader.cc:83] Reading SavedModel from: /tmp/tmp13mra4pw
2024-02-02 17:50:30.400605: I tensorflow/cc/saved_model/reader.cc:51] Reading meta graph with tags { serve }
2024-02-02 17:50:30.400619: I tensorflow/cc/saved_model/reader.cc:146] Reading SavedModel debug info (if present) from: /tmp/tmp13mra4pw
2024-02-02 17:50:30.401264: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:382] MLIR V1 optimization pass is not enabled
2024-02-02 17:50:30.401435: I tensorflow/cc/saved_model/loader.cc:233] Restoring SavedModel bundle.
2024-02-02 17:50:30.419847: I tensorflow/cc/saved_model/loader.cc:217] Running initialization op on SavedModel bundle at path: /tmp/tmp13mra4pw
2024-02-02 17:50:30.422643: I tensorflow/cc/saved_model/loader.cc:316] SavedModel load for tags { serve }; Status: success: OK. Took 22284 microseconds.
2024-02-02 17:50:30.451273: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:269] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.
2024-02-02 17:50:30.507802: I tensorflow/compiler/mlir/lite/flatbuffer_export.cc:2245] Estimated count of arithmetic ops: 0 ops, equivalently 0 MACs
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
Output has nan: True
Writing output to faulty_output.npy
@jamwar01 The simplest solution is to use the BUILTIN_REF kernels instead of the BUILTIN kernels. BUILTIN_REF kernels are reference implementations and often slower, but they shouldn't produce NaN outputs in this case. Here's how to switch:
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTIN_REF]
Thank you!
Thank you for your reply! 😄 Yes, the workaround for now is to use the reference kernels like you say. My aim was to just report that the builtin kernels appear to be broken so that this is highlighted to the relevant team. The decrease in performance from using the reference kernels is likely a deterrent in many cases, however, so I believe it would be useful to have this addressed.
Hi @jamwar01,
I have tested the given code with BUILTIN kernels in TF 2.15 version. It is working fine and the output tensor is not giving any Nan values. Here is the screenshot.
and the output tensor values are
array([[[[0.18350317, 0.8055407 , 0.08095651, ..., 0.09189863,
0.64712274, 0.42581546],
[0.06950494, 0.19689496, 0.945694 , ..., 0.96190053,
0.8043054 , 0.6203221 ],
[0.50733095, 0.00871299, 0.7729663 , ..., 0.3727163 ,
0.2478801 , 0.4909967 ],
...,
[0.7556198 , 0.86681217, 0.07057429, ..., 0.4914943 ,
0.46564332, 0.7217616 ],
[0.4533622 , 0.08109082, 0.6991882 , ..., 0.2784072 ,
0.73928165, 0.6248881 ],
[0.06713927, 0.37988612, 0.6965632 , ..., 0.66882867,
0.22982682, 0.7331834 ]],
[[0.6969852 , 0.3979096 , 0.30966353, ..., 0.8206956 ,
0.07177956, 0.0412529 ],
[0.87058693, 0.46980223, 0.7791571 , ..., 0.08392384,
0.44429946, 0.41385922],
[0.12787104, 0.06190566, 0.9563843 , ..., 0.66872364,
0.5529266 , 0.69724584],
...,
[0.24671873, 0.8656299 , 0.64001596, ..., 0.5273241 ,
0.46549922, 0.01413841],
[0.8001449 , 0.303727 , 0.41121402, ..., 0.42395937,
0.68907714, 0.9973794 ],
[0.5249677 , 0.69011617, 0.32280397, ..., 0.29401043,
0.8321104 , 0.8224229 ]],
[[0.46167508, 0.13801032, 0.41837 , ..., 0.76498574,
0.53632194, 0.6082858 ],
[0.9040914 , 0.9073978 , 0.5598819 , ..., 0.77390254,
0.5010137 , 0.7959867 ],
[0.9356298 , 0.838803 , 0.2510756 , ..., 0.27377617,
0.03432407, 0.8112841 ],
...,
[0.19019738, 0.15415408, 0.15916935, ..., 0.36066476,
0.02571733, 0.88389844],
[0.05659891, 0.00807601, 0.35056975, ..., 0.99356574,
0.0229959 , 0.17586842],
[0.16265824, 0.9375197 , 0.04004565, ..., 0.90708274,
0.4906749 , 0.01150649]],
...,
[[0.9874541 , 0.13711593, 0.03413203, ..., 0.27944687,
0.5725812 , 0.2872343 ],
[0.93618304, 0.05400326, 0.80379486, ..., 0.6891535 ,
0.85990685, 0.09732993],
[0.6015796 , 0.6119976 , 0.17900743, ..., 0.64661974,
0.47710946, 0.5185745 ],
...,
[0.3314257 , 0.976641 , 0.50370747, ..., 0.18451059,
0.8898673 , 0.06551789],
[0.7574596 , 0.6803014 , 0.5806643 , ..., 0.02810532,
0.21359259, 0.13841787],
[0.360362 , 0.8378374 , 0.17994598, ..., 0.52578354,
0.8449946 , 0.00566057]],
[[0.90867203, 0.96147287, 0.00522611, ..., 0.49788418,
0.51192576, 0.87039846],
[0.8130206 , 0.3965184 , 0.5445026 , ..., 0.7833688 ,
0.3920826 , 0.5033432 ],
[0.58092123, 0.22957331, 0.06166744, ..., 0.04113004,
0.3806144 , 0.66953444],
...,
[0.2541557 , 0.7876428 , 0.74799436, ..., 0.8414788 ,
0.32410142, 0.25649405],
[0.41616407, 0.41103885, 0.3102394 , ..., 0.3179237 ,
0.41209835, 0.86601245],
[0.13197434, 0.9770973 , 0.576634 , ..., 0.8140475 ,
0.3756017 , 0.648409 ]],
[[0.46594724, 0.38555008, 0.9656739 , ..., 0.3989894 ,
0.73881274, 0.696691 ],
[0.42470434, 0.03731331, 0.5988427 , ..., 0.26365036,
0.183001 , 0.6578406 ],
[0.4221254 , 0.62892705, 0.8580361 , ..., 0.4409532 ,
0.55401707, 0.39752722],
...,
[0.84856015, 0.12720175, 0.12806697, ..., 0.4363036 ,
0.7615763 , 0.5988579 ],
[0.20318006, 0.40418512, 0.9333598 , ..., 0.17719397,
0.97456586, 0.42055926],
[0.2521532 , 0.32505414, 0.40645653, ..., 0.863737 ,
0.8764026 , 0.04436916]]]], dtype=float32)
Please refer the gist.
Thank You
This issue is stale because it has been open for 7 days with no activity. It will be closed if no further activity occurs. Thank you.
I still have nan output when using my specific numpy input. Could you provide a method whereby I can transfer the faulty_input.npy to you? Its compressed size is 28MB which exceeds the 25MB limit set in this page. Thank you.
Hi @jamwar01 ,
Please share your faulty_input.npy through the google drive link.
Thank You
This issue is stale because it has been open for 7 days with no activity. It will be closed if no further activity occurs. Thank you.
This issue was closed because it has been inactive for 7 days since being marked as stale. Please reopen if you'd like to work on this further.