tensorflow PReLU Op Builtin Kernel gives NaN output

Issue type

Bug

Have you reproduced the bug with TensorFlow Nightly?

Yes

Source

source

TensorFlow version

tf 2.14

Custom code

Yes

OS platform and distribution

Linux Ubuntu 20.04.6 LTS

Mobile device

No response

Python version

3.11

Bazel version

No response

GCC/compiler version

No response

CUDA/cuDNN version

No response

GPU model and memory

No response

Current behavior?

Some output values in a PReLU output tensor are NaN when using TFLite Interpreter with BUILTIN kernels. No NaNs are seen when using BUILTIN_REF (reference) kernels, so this appears to only be an issue with builtin. I would expect to see similar values when using both builtin and reference kernels; and not see any NaNs.

Standalone code to reproduce the issue

import tensorflow as tf
import numpy as np

def make_prelu_tflite():
    model = tf.keras.Sequential(
        [
            tf.keras.Input((540, 960, 16), dtype=tf.float32),
            tf.keras.layers.PReLU(shared_axes=(1,2,3))
        ]
    )

    # Imitate effect of training prelu weight
    a = np.ndarray(shape=(1,1,1,1))
    a[0][0][0][0] = 0.00040957872988656163
    model.layers[0].set_weights(a)

    # Convert and save the model
    converter = tf.lite.TFLiteConverter.from_keras_model(model)
    tflite_model = converter.convert()
    with open(TFLITE_FILE, 'wb') as f:
       f.write(tflite_model)

def run_tflite_inference(tflite_path, input_npy_path, out_npy_path):
    # Using AUTO/BUILTIN resolver
    interpreter = tf.lite.Interpreter(model_path=tflite_path)

    input_details = interpreter.get_input_details()
    output_details = interpreter.get_output_details()
    interpreter.allocate_tensors()

    input_npy = np.load(input_npy_path)
    interpreter.set_tensor(input_details[0]['index'], input_npy)

    interpreter.invoke()
    output = interpreter.get_tensor(output_details[0]['index'])

    print(f"Output has nan: {np.any(np.isnan(output))}")

    print(f"Writing output to {out_npy_path}")
    np.save(f"{out_npy_path}", output)


if __name__ == "__main__":
    TFLITE_FILE = "simple_prelu.tflite"
    NPY_INPUT_FILE = "faulty_input.npy"
    NPY_OUTPUT_FILE = "faulty_output.npy"
    make_prelu_tflite()
    run_tflite_inference(TFLITE_FILE, NPY_INPUT_FILE, NPY_OUTPUT_FILE)

Relevant log output

2024-02-02 17:50:30.399478: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:378] Ignored output_format.
2024-02-02 17:50:30.399540: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:381] Ignored drop_control_dependency.
2024-02-02 17:50:30.400359: I tensorflow/cc/saved_model/reader.cc:83] Reading SavedModel from: /tmp/tmp13mra4pw
2024-02-02 17:50:30.400605: I tensorflow/cc/saved_model/reader.cc:51] Reading meta graph with tags { serve }
2024-02-02 17:50:30.400619: I tensorflow/cc/saved_model/reader.cc:146] Reading SavedModel debug info (if present) from: /tmp/tmp13mra4pw
2024-02-02 17:50:30.401264: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:382] MLIR V1 optimization pass is not enabled
2024-02-02 17:50:30.401435: I tensorflow/cc/saved_model/loader.cc:233] Restoring SavedModel bundle.
2024-02-02 17:50:30.419847: I tensorflow/cc/saved_model/loader.cc:217] Running initialization op on SavedModel bundle at path: /tmp/tmp13mra4pw
2024-02-02 17:50:30.422643: I tensorflow/cc/saved_model/loader.cc:316] SavedModel load for tags { serve }; Status: success: OK. Took 22284 microseconds.
2024-02-02 17:50:30.451273: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:269] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.
2024-02-02 17:50:30.507802: I tensorflow/compiler/mlir/lite/flatbuffer_export.cc:2245] Estimated count of arithmetic ops: 0  ops, equivalently 0  MACs
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
Output has nan: True
Writing output to faulty_output.npy

Feb 02 '24 18:02 jamwar01

@jamwar01 The simplest solution is to use the BUILTIN_REF kernels instead of the BUILTIN kernels. BUILTIN_REF kernels are reference implementations and often slower, but they shouldn't produce NaN outputs in this case. Here's how to switch:

converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTIN_REF]

Thank you!

Feb 03 '24 07:02 sushreebarsa

Thank you for your reply! 😄 Yes, the workaround for now is to use the reference kernels like you say. My aim was to just report that the builtin kernels appear to be broken so that this is highlighted to the relevant team. The decrease in performance from using the reference kernels is likely a deterrent in many cases, however, so I believe it would be useful to have this addressed.

Feb 06 '24 10:02 jamwar01

Hi @jamwar01,

I have tested the given code with BUILTIN kernels in TF 2.15 version. It is working fine and the output tensor is not giving any Nan values. Here is the screenshot.

and the output tensor values are

array([[[[0.18350317, 0.8055407 , 0.08095651, ..., 0.09189863,
          0.64712274, 0.42581546],
         [0.06950494, 0.19689496, 0.945694  , ..., 0.96190053,
          0.8043054 , 0.6203221 ],
         [0.50733095, 0.00871299, 0.7729663 , ..., 0.3727163 ,
          0.2478801 , 0.4909967 ],
         ...,
         [0.7556198 , 0.86681217, 0.07057429, ..., 0.4914943 ,
          0.46564332, 0.7217616 ],
         [0.4533622 , 0.08109082, 0.6991882 , ..., 0.2784072 ,
          0.73928165, 0.6248881 ],
         [0.06713927, 0.37988612, 0.6965632 , ..., 0.66882867,
          0.22982682, 0.7331834 ]],

        [[0.6969852 , 0.3979096 , 0.30966353, ..., 0.8206956 ,
          0.07177956, 0.0412529 ],
         [0.87058693, 0.46980223, 0.7791571 , ..., 0.08392384,
          0.44429946, 0.41385922],
         [0.12787104, 0.06190566, 0.9563843 , ..., 0.66872364,
          0.5529266 , 0.69724584],
         ...,
         [0.24671873, 0.8656299 , 0.64001596, ..., 0.5273241 ,
          0.46549922, 0.01413841],
         [0.8001449 , 0.303727  , 0.41121402, ..., 0.42395937,
          0.68907714, 0.9973794 ],
         [0.5249677 , 0.69011617, 0.32280397, ..., 0.29401043,
          0.8321104 , 0.8224229 ]],

        [[0.46167508, 0.13801032, 0.41837   , ..., 0.76498574,
          0.53632194, 0.6082858 ],
         [0.9040914 , 0.9073978 , 0.5598819 , ..., 0.77390254,
          0.5010137 , 0.7959867 ],
         [0.9356298 , 0.838803  , 0.2510756 , ..., 0.27377617,
          0.03432407, 0.8112841 ],
         ...,
         [0.19019738, 0.15415408, 0.15916935, ..., 0.36066476,
          0.02571733, 0.88389844],
         [0.05659891, 0.00807601, 0.35056975, ..., 0.99356574,
          0.0229959 , 0.17586842],
         [0.16265824, 0.9375197 , 0.04004565, ..., 0.90708274,
          0.4906749 , 0.01150649]],

        ...,

        [[0.9874541 , 0.13711593, 0.03413203, ..., 0.27944687,
          0.5725812 , 0.2872343 ],
         [0.93618304, 0.05400326, 0.80379486, ..., 0.6891535 ,
          0.85990685, 0.09732993],
         [0.6015796 , 0.6119976 , 0.17900743, ..., 0.64661974,
          0.47710946, 0.5185745 ],
         ...,
         [0.3314257 , 0.976641  , 0.50370747, ..., 0.18451059,
          0.8898673 , 0.06551789],
         [0.7574596 , 0.6803014 , 0.5806643 , ..., 0.02810532,
          0.21359259, 0.13841787],
         [0.360362  , 0.8378374 , 0.17994598, ..., 0.52578354,
          0.8449946 , 0.00566057]],

        [[0.90867203, 0.96147287, 0.00522611, ..., 0.49788418,
          0.51192576, 0.87039846],
         [0.8130206 , 0.3965184 , 0.5445026 , ..., 0.7833688 ,
          0.3920826 , 0.5033432 ],
         [0.58092123, 0.22957331, 0.06166744, ..., 0.04113004,
          0.3806144 , 0.66953444],
         ...,
         [0.2541557 , 0.7876428 , 0.74799436, ..., 0.8414788 ,
          0.32410142, 0.25649405],
         [0.41616407, 0.41103885, 0.3102394 , ..., 0.3179237 ,
          0.41209835, 0.86601245],
         [0.13197434, 0.9770973 , 0.576634  , ..., 0.8140475 ,
          0.3756017 , 0.648409  ]],

        [[0.46594724, 0.38555008, 0.9656739 , ..., 0.3989894 ,
          0.73881274, 0.696691  ],
         [0.42470434, 0.03731331, 0.5988427 , ..., 0.26365036,
          0.183001  , 0.6578406 ],
         [0.4221254 , 0.62892705, 0.8580361 , ..., 0.4409532 ,
          0.55401707, 0.39752722],
         ...,
         [0.84856015, 0.12720175, 0.12806697, ..., 0.4363036 ,
          0.7615763 , 0.5988579 ],
         [0.20318006, 0.40418512, 0.9333598 , ..., 0.17719397,
          0.97456586, 0.42055926],
         [0.2521532 , 0.32505414, 0.40645653, ..., 0.863737  ,
          0.8764026 , 0.04436916]]]], dtype=float32)

Please refer the gist.

Thank You

Feb 08 '24 06:02 LakshmiKalaKadali

This issue is stale because it has been open for 7 days with no activity. It will be closed if no further activity occurs. Thank you.

Feb 16 '24 01:02 github-actions[bot]

I still have nan output when using my specific numpy input. Could you provide a method whereby I can transfer the faulty_input.npy to you? Its compressed size is 28MB which exceeds the 25MB limit set in this page. Thank you.

Feb 21 '24 17:02 jamwar01

Hi @jamwar01 ,

Please share your faulty_input.npy through the google drive link.

Thank You

Feb 22 '24 10:02 LakshmiKalaKadali

This issue is stale because it has been open for 7 days with no activity. It will be closed if no further activity occurs. Thank you.

Mar 01 '24 01:03 github-actions[bot]

This issue was closed because it has been inactive for 7 days since being marked as stale. Please reopen if you'd like to work on this further.

Mar 09 '24 01:03 github-actions[bot]

Are you satisfied with the resolution of your issue? Yes No

Mar 09 '24 01:03 google-ml-butler[bot]