TensorFI py_func Crashes

py_func Crashes

Open altostratous opened this issue 4 years ago • 1 comments

Environment info

Operating System: NAME="Ubuntu" VERSION="20.04.1 LTS (Focal Fossa)" ID=ubuntu ID_LIKE=debian PRETTY_NAME="Ubuntu 20.04.1 LTS" VERSION_ID="20.04" HOME_URL="https://www.ubuntu.com/" SUPPORT_URL="https://help.ubuntu.com/" BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/" PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy" VERSION_CODENAME=focal UBUNTU_CODENAME=focal

Installed version of CUDA and cuDNN: None

(please attach the output of ls -l /path/to/cuda/lib/libcud*):

(base) ali@simon:/tmp/mozilla_ali0$ ls -l /path/to/cuda/lib/libcud*
ls: cannot access '/path/to/cuda/lib/libcud*': No such file or directory

If installed from binary pip package, provide:

Which pip package you installed.
The output from python -c "import tensorflow; print(tensorflow.version)".

If installed from sources, provide the commit hash: 11b328425e5c4a0c2852aea9db5a61fbc7aa290c

Steps to reproduce

Instantiate ResNet50 with keras.
Load TensorFI on it.
Run prediction with fault injections enabled.

The code is bellow:

from tensorflow.keras.applications.resnet50 import ResNet50
from tensorflow.keras.preprocessing import image
from tensorflow.keras.applications.resnet50 import preprocess_input, decode_predictions
import numpy as np
import TensorFI as fi
from tensorflow.keras.backend import get_session

model = ResNet50(weights='imagenet')

img_path = 'val_5.JPEG'
img = image.load_img(img_path, target_size=(224, 224))
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
x = preprocess_input(x)

session = get_session()

tf = fi.TensorFI(session, disableInjections=False, logLevel=50)

preds = session.run(model.outputs[0], feed_dict={model.inputs[0]: x})

# preds = model.predict(x)

# decode the results into a list of tuples (class, description, probability)
# (one such list for each sample in the batch)
print('Predicted:', decode_predictions(preds, top=3)[0])
# Predicted: [(u'n02504013', u'Indian_elephant', 0.82658225), (u'n01871265', u'tusker', 0.1122357), (u'n02504458', u'African_elephant', 0.061040461)]

here is the input image used: val_5

when I turn off the injections I get the expected output:

 ('Predicted:', [(u'n04399382', u'teddy', 0.81401235), (u'n02105641', u'Old_English_sheepdog', 0.032959767), (u'n04008634', u'projectile', 0.020169798)])

What have you tried?

tracing the code which ends in some c execution and terminates by a check in py_func.cc

Logs or other output that would be helpful

(If logs are large, please upload as attachment).

/home/ali/anaconda/envs/tensorfi/bin/python /home/ali/Desktop/Code/TensorFI/resnet50/model.py
WARNING:tensorflow:From /home/ali/Desktop/Code/TensorFI/resnet50/model.py:6: The name tf.keras.backend.get_session is deprecated. Please use tf.compat.v1.keras.backend.get_session instead.

WARNING:tensorflow:From /home/ali/anaconda/envs/tensorfi/lib/python2.7/site-packages/tensorflow/python/ops/init_ops.py:1251: calling __init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
2021-02-05 18:53:34.853793: I tensorflow/core/platform/cpu_feature_guard.cc:145] This TensorFlow binary is optimized with Intel(R) MKL-DNN to use the following CPU instructions in performance critical operations:  SSE4.1 SSE4.2 AVX AVX2 FMA
To enable them in non-MKL-DNN operations, rebuild TensorFlow with the appropriate compiler flags.
2021-02-05 18:53:34.881342: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2394305000 Hz
2021-02-05 18:53:34.881859: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5582bd2a2eb0 executing computations on platform Host. Devices:
2021-02-05 18:53:34.881909: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): <undefined>, <undefined>
OMP: Info #212: KMP_AFFINITY: decoding x2APIC ids.
OMP: Info #210: KMP_AFFINITY: Affinity capable, using global cpuid leaf 11 info
OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: 0-3
OMP: Info #156: KMP_AFFINITY: 4 available OS procs
OMP: Info #157: KMP_AFFINITY: Uniform topology
OMP: Info #179: KMP_AFFINITY: 1 packages x 2 cores/pkg x 2 threads/core (2 total cores)
OMP: Info #214: KMP_AFFINITY: OS proc to physical thread map:
OMP: Info #171: KMP_AFFINITY: OS proc 0 maps to package 0 core 0 thread 0 
OMP: Info #171: KMP_AFFINITY: OS proc 1 maps to package 0 core 0 thread 1 
OMP: Info #171: KMP_AFFINITY: OS proc 2 maps to package 0 core 1 thread 0 
OMP: Info #171: KMP_AFFINITY: OS proc 3 maps to package 0 core 1 thread 1 
OMP: Info #250: KMP_AFFINITY: pid 90837 tid 90837 thread 0 bound to OS proc set 0
2021-02-05 18:53:34.882399: I tensorflow/core/common_runtime/process_util.cc:115] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance.
2021-02-05 18:53:35.374807: W tensorflow/compiler/jit/mark_for_compilation_pass.cc:1412] (One-time warning): Not using XLA:CPU for cluster because envvar TF_XLA_FLAGS=--tf_xla_cpu_global_jit was not set.  If you want XLA:CPU, either set that envvar, or use experimental_jit_scope to enable XLA:CPU.  To confirm that XLA is active, pass --vmodule=xla_compilation_cache=1 (as a proper command-line flag, not via TF_XLA_FLAGS) or set the envvar XLA_FLAGS=--xla_hlo_profile.
/home/ali/Desktop/Code/TensorFI/TensorFI/fiConfig.py:270: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  params = yaml.load(pStream)
Unable to open log file faultLogs/NoName-log
Starting log at 2021-02-05 18:53:40.952907


---------------------------------------
2021-02-05 18:53:43.067374: F tensorflow/python/lib/core/py_func.cc:466] Check failed: DataTypeCanUseMemcpy(t.dtype()) 

Process finished with exit code 134 (interrupted by signal 6: SIGABRT)

Feb 07 '21 01:02 altostratous

I'm able to reproduce this failure on my side. I suspect it may have to do with py_func being deprecated in TF now, but I'm not sure. It'd be helpful perhaps to determine what operator is causing this by enabling the print statements in the modifyGraph.py file.

Feb 08 '21 04:02 karthikp-ubc

TensorFI TensorFI copied to clipboard

py_func Crashes

Environment info

Steps to reproduce

What have you tried?

Logs or other output that would be helpful

TensorFI
TensorFI copied to clipboard