TensorFI
TensorFI copied to clipboard
py_func Crashes
Environment info
Operating System: NAME="Ubuntu" VERSION="20.04.1 LTS (Focal Fossa)" ID=ubuntu ID_LIKE=debian PRETTY_NAME="Ubuntu 20.04.1 LTS" VERSION_ID="20.04" HOME_URL="https://www.ubuntu.com/" SUPPORT_URL="https://help.ubuntu.com/" BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/" PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy" VERSION_CODENAME=focal UBUNTU_CODENAME=focal
Installed version of CUDA and cuDNN: None
(please attach the output of ls -l /path/to/cuda/lib/libcud*
):
(base) ali@simon:/tmp/mozilla_ali0$ ls -l /path/to/cuda/lib/libcud*
ls: cannot access '/path/to/cuda/lib/libcud*': No such file or directory
If installed from binary pip package, provide:
- Which pip package you installed.
- The output from python -c "import tensorflow; print(tensorflow.version)".
If installed from sources, provide the commit hash: 11b328425e5c4a0c2852aea9db5a61fbc7aa290c
Steps to reproduce
- Instantiate ResNet50 with keras.
- Load TensorFI on it.
- Run prediction with fault injections enabled.
The code is bellow:
from tensorflow.keras.applications.resnet50 import ResNet50
from tensorflow.keras.preprocessing import image
from tensorflow.keras.applications.resnet50 import preprocess_input, decode_predictions
import numpy as np
import TensorFI as fi
from tensorflow.keras.backend import get_session
model = ResNet50(weights='imagenet')
img_path = 'val_5.JPEG'
img = image.load_img(img_path, target_size=(224, 224))
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
x = preprocess_input(x)
session = get_session()
tf = fi.TensorFI(session, disableInjections=False, logLevel=50)
preds = session.run(model.outputs[0], feed_dict={model.inputs[0]: x})
# preds = model.predict(x)
# decode the results into a list of tuples (class, description, probability)
# (one such list for each sample in the batch)
print('Predicted:', decode_predictions(preds, top=3)[0])
# Predicted: [(u'n02504013', u'Indian_elephant', 0.82658225), (u'n01871265', u'tusker', 0.1122357), (u'n02504458', u'African_elephant', 0.061040461)]
here is the input image used:
when I turn off the injections I get the expected output:
('Predicted:', [(u'n04399382', u'teddy', 0.81401235), (u'n02105641', u'Old_English_sheepdog', 0.032959767), (u'n04008634', u'projectile', 0.020169798)])
What have you tried?
- tracing the code which ends in some c execution and terminates by a check in py_func.cc
Logs or other output that would be helpful
(If logs are large, please upload as attachment).
/home/ali/anaconda/envs/tensorfi/bin/python /home/ali/Desktop/Code/TensorFI/resnet50/model.py
WARNING:tensorflow:From /home/ali/Desktop/Code/TensorFI/resnet50/model.py:6: The name tf.keras.backend.get_session is deprecated. Please use tf.compat.v1.keras.backend.get_session instead.
WARNING:tensorflow:From /home/ali/anaconda/envs/tensorfi/lib/python2.7/site-packages/tensorflow/python/ops/init_ops.py:1251: calling __init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
2021-02-05 18:53:34.853793: I tensorflow/core/platform/cpu_feature_guard.cc:145] This TensorFlow binary is optimized with Intel(R) MKL-DNN to use the following CPU instructions in performance critical operations: SSE4.1 SSE4.2 AVX AVX2 FMA
To enable them in non-MKL-DNN operations, rebuild TensorFlow with the appropriate compiler flags.
2021-02-05 18:53:34.881342: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2394305000 Hz
2021-02-05 18:53:34.881859: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5582bd2a2eb0 executing computations on platform Host. Devices:
2021-02-05 18:53:34.881909: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): <undefined>, <undefined>
OMP: Info #212: KMP_AFFINITY: decoding x2APIC ids.
OMP: Info #210: KMP_AFFINITY: Affinity capable, using global cpuid leaf 11 info
OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: 0-3
OMP: Info #156: KMP_AFFINITY: 4 available OS procs
OMP: Info #157: KMP_AFFINITY: Uniform topology
OMP: Info #179: KMP_AFFINITY: 1 packages x 2 cores/pkg x 2 threads/core (2 total cores)
OMP: Info #214: KMP_AFFINITY: OS proc to physical thread map:
OMP: Info #171: KMP_AFFINITY: OS proc 0 maps to package 0 core 0 thread 0
OMP: Info #171: KMP_AFFINITY: OS proc 1 maps to package 0 core 0 thread 1
OMP: Info #171: KMP_AFFINITY: OS proc 2 maps to package 0 core 1 thread 0
OMP: Info #171: KMP_AFFINITY: OS proc 3 maps to package 0 core 1 thread 1
OMP: Info #250: KMP_AFFINITY: pid 90837 tid 90837 thread 0 bound to OS proc set 0
2021-02-05 18:53:34.882399: I tensorflow/core/common_runtime/process_util.cc:115] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance.
2021-02-05 18:53:35.374807: W tensorflow/compiler/jit/mark_for_compilation_pass.cc:1412] (One-time warning): Not using XLA:CPU for cluster because envvar TF_XLA_FLAGS=--tf_xla_cpu_global_jit was not set. If you want XLA:CPU, either set that envvar, or use experimental_jit_scope to enable XLA:CPU. To confirm that XLA is active, pass --vmodule=xla_compilation_cache=1 (as a proper command-line flag, not via TF_XLA_FLAGS) or set the envvar XLA_FLAGS=--xla_hlo_profile.
/home/ali/Desktop/Code/TensorFI/TensorFI/fiConfig.py:270: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
params = yaml.load(pStream)
Unable to open log file faultLogs/NoName-log
Starting log at 2021-02-05 18:53:40.952907
---------------------------------------
2021-02-05 18:53:43.067374: F tensorflow/python/lib/core/py_func.cc:466] Check failed: DataTypeCanUseMemcpy(t.dtype())
Process finished with exit code 134 (interrupted by signal 6: SIGABRT)
I'm able to reproduce this failure on my side. I suspect it may have to do with py_func being deprecated in TF now, but I'm not sure. It'd be helpful perhaps to determine what operator is causing this by enabling the print statements in the modifyGraph.py file.