odl icon indicating copy to clipboard operation
odl copied to clipboard

libiomp5.so already initialized

Open tonyreina opened this issue 6 years ago • 9 comments

I am getting this error when I try to run ODL with a TensorFlow model:

OMP: Error #15: Initializing libiomp5.so, but found libiomp5.so already initialized. OMP: Hint This means that multiple copies of the OpenMP runtime have been linked into the program. That is dangerous, since it can degrade performance or cause incorrect results. The best thing to do is to ensure that only a single OpenMP runtime is linked into the process, e.g. by avoiding static linking of the OpenMP runtime in any library. As an unsafe, unsupported, undocumented workaround you can set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorrect results. For more information, please see http://www.intel.com/software/products/support/.

This occurs when I am using TensorFlow with Intel MKL-DNN (which is the default on the Anaconda repository).

I've spoken to the Intel TensorFlow team and they think that ODL might be trying to access a second libiomp5.so in the environment that isn't being used by TensorFlow MKL-DNN or is somehow blocking it.

Would there be anyone willing to work with the Intel team to resolve the conflict? I can help with the introduction.

Thanks! Best, -Tony

tonyreina avatar Jan 19 '19 19:01 tonyreina

Hello!

In case you are using ASTRA, the conflict is likely technically not with ODL but with ASTRA, which is calling OMP under the hood. Would there be some way for you to try calling ASTRA without ODL?

Otherwise we'd need a more extensive example. Of course we want to solve this.

adler-j avatar Jan 19 '19 19:01 adler-j

Thanks so much. Yes I tried calling without Astra as well and get the same results.

I've got a very simple example:

import tensorflow as tf
import odl
import odl.contrib.tensorflow

sess = tf.Session()

size_x = 1024  # It will work for smaller numbers like 128
size_y = 1024  # It will work for smaller numbers like 128
upsampling = [1, 1]
space = odl.uniform_discr([-int(size_x/2), -int(size_y/2)], [int(size_x/2), int(size_y/2)], [size_x, size_y],dtype='float32')
angle_partition = odl.uniform_partition(0, 3.1415, 90)
detector_partition = odl.uniform_partition(-int(size_x/2), int(size_y/2), size_x)
geometry = odl.tomo.Parallel2dGeometry(angle_partition, detector_partition)

print("Ok here.")
operator = odl.tomo.RayTransform(space, geometry)
print("Fails on next line")
pseudoinverse = odl.tomo.fbp_op(operator)
print("Won't get to this line.")

For the conda environment:

conda create -n bug -y -c anaconda pip python=3.6 tensorflow scikit-image
conda activate bug

conda install -c odlgroup odl

tonyreina avatar Jan 20 '19 17:01 tonyreina

Hi @tonyreina! I ran your minimal example and unfortunately I can't reproduce your error. On my machine (Linux), everything runs without problems.

On which platform are you working?

Regarding packages that use OpenMP, there are likely a bunch in our (optional) dependencies, but we don't explicitly load it ourselves, so we can't really do anything about this issue.

The error that you observe, is it raised by Tensorflow? The error message says that "multiple copies of the OpenMP runtime have been linked into the program. To me that looks more like an issue with the compiled package (Tensorflow or whoever raised the error).

kohr-h avatar Jan 29 '19 19:01 kohr-h

I can't reproduce this, either. My output:

2019-01-29 20:38:15.153319: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2019-01-29 20:38:15.162335: I tensorflow/core/common_runtime/process_util.cc:69] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance.
Ok here.
/home/banert/miniconda3/envs/bug/lib/python3.6/site-packages/odl/tomo/operators/ray_trafo.py:144: RuntimeWarning: The best available backend ('skimage') may be too slow for volumes of this size. Consider using ASTRA. This warning can be disabled by explicitly setting `impl='skimage'`.
  RuntimeWarning)
Fails on next line
Won't get to this line.

sbanert avatar Jan 29 '19 19:01 sbanert

Thanks. Yes. I'm using the pre-compiled TensorFlow package from Anaconda. So to install TensorFlow I am doing:

conda install -c anaconda tensorflow

The Intel MKL-DNN library being used by that version of TensorFlow I think is also linking to the libiomp5.so.

Best. -Tony

tonyreina avatar Jan 29 '19 21:01 tonyreina

If I use just the pip install tensorflow it gets the non-MKL-DNN version of TensorFlow and works ok. However, the MKL-DNN one is significantly faster than the non-MKL-DNN one (for regular TensorFlow models).

tonyreina avatar Jan 29 '19 21:01 tonyreina

conda install -c anaconda tensorflow gives the result

Solving environment: done

# All requested packages already installed.

I still can't reproduce the error, working with the following packages:

# packages in environment at /home/banert/miniconda3/envs/bug:
#
# Name                    Version                   Build  Channel
_tflow_select             2.3.0                       mkl    anaconda
absl-py                   0.7.0                    py36_0    anaconda
astor                     0.7.1                    py36_0    anaconda
blas                      1.0                         mkl    anaconda
c-ares                    1.15.0               h7b6447c_1    anaconda
ca-certificates           2018.12.5                     0    anaconda
certifi                   2018.11.29               py36_0    anaconda
cloudpickle               0.6.1                    py36_0    anaconda
cycler                    0.10.0                   py36_0    anaconda
cytoolz                   0.9.0.1          py36h14c3975_1    anaconda
dask-core                 1.0.0                    py36_0    anaconda
dbus                      1.13.6               h746ee38_0    anaconda
decorator                 4.3.0                    py36_0    anaconda
expat                     2.2.6                he6710b0_0    anaconda
fontconfig                2.13.0               h9420a91_0    anaconda
freetype                  2.9.1                h8a8886c_1    anaconda
future                    0.17.1                   py36_0  
gast                      0.2.2                    py36_0    anaconda
glib                      2.56.2               hd408876_0    anaconda
grpcio                    1.16.1           py36hf8bcb03_1    anaconda
gst-plugins-base          1.14.0               hbbd80ab_1    anaconda
gstreamer                 1.14.0               hb453b48_1    anaconda
h5py                      2.9.0            py36h7918eee_0    anaconda
hdf5                      1.10.4               hb1b8bf9_0    anaconda
icu                       58.2                 h211956c_0    anaconda
imageio                   2.4.1                    py36_0    anaconda
intel-openmp              2019.1                      144    anaconda
jpeg                      9b                   habf39ab_1    anaconda
keras-applications        1.0.6                    py36_0    anaconda
keras-preprocessing       1.0.5                    py36_0    anaconda
kiwisolver                1.0.1            py36hf484d3e_0    anaconda
libedit                   3.1.20181209         hc058e9b_0    anaconda
libffi                    3.2.1                h4deb6c0_3    anaconda
libgcc-ng                 8.2.0                hdf63c60_1    anaconda
libgfortran-ng            7.3.0                hdf63c60_0    anaconda
libpng                    1.6.36               hbc83047_0    anaconda
libprotobuf               3.6.1                hd408876_0    anaconda
libstdcxx-ng              8.2.0                hdf63c60_1    anaconda
libtiff                   4.0.10            h2733197_1001    anaconda
libuuid                   1.0.3                h1bed415_2    anaconda
libxcb                    1.13                 h1bed415_1    anaconda
libxml2                   2.9.9                he19cac6_0    anaconda
markdown                  3.0.1                    py36_0    anaconda
matplotlib                3.0.2            py36h5429711_0    anaconda
mkl                       2019.1                      144    anaconda
mkl_fft                   1.0.10           py36ha843d7b_0    anaconda
mkl_random                1.0.2            py36hd81dba3_0    anaconda
ncurses                   6.1                  he6710b0_1    anaconda
networkx                  2.2                      py36_1    anaconda
numpy                     1.15.4           py36h7e9f1db_0    anaconda
numpy-base                1.15.4           py36hde5b4d6_0    anaconda
odl                       0.7.0                    py36_0    odlgroup
olefile                   0.46                     py36_0    anaconda
openssl                   1.1.1                h7b6447c_0    anaconda
packaging                 18.0                     py36_0  
pcre                      8.42                 h439df22_0    anaconda
pillow                    5.4.1            py36h34e0f95_0    anaconda
pip                       18.1                     py36_0    anaconda
protobuf                  3.6.1            py36he6710b0_0    anaconda
pyparsing                 2.3.1                    py36_0    anaconda
pyqt                      5.9.2            py36h22d08a2_1    anaconda
python                    3.6.8                h0371630_0    anaconda
python-dateutil           2.7.5                    py36_0    anaconda
pytz                      2018.9                   py36_0    anaconda
pywavelets                1.0.1            py36hdd07704_0    anaconda
qt                        5.9.7                h5867ecd_1    anaconda
readline                  7.0                  h7b6447c_5    anaconda
scikit-image              0.14.1           py36he6710b0_0    anaconda
scipy                     1.2.0            py36h7c811a0_0    anaconda
setuptools                40.6.3                   py36_0    anaconda
sip                       4.19.13          py36he6710b0_0    anaconda
six                       1.12.0                   py36_0    anaconda
sqlite                    3.26.0               h7b6447c_0    anaconda
tensorboard               1.12.2           py36he6710b0_0    anaconda
tensorflow                1.12.0          mkl_py36h69b6ba0_0    anaconda
tensorflow-base           1.12.0          mkl_py36h3c3e929_0    anaconda
termcolor                 1.1.0                    py36_1    anaconda
tk                        8.6.8                hbc83047_0    anaconda
toolz                     0.9.0                    py36_0    anaconda
tornado                   5.1.1            py36h7b6447c_0    anaconda
werkzeug                  0.14.1                   py36_0    anaconda
wheel                     0.32.3                   py36_0    anaconda
xz                        5.2.4                h14c3975_4    anaconda
zlib                      1.2.11               h7b6447c_3    anaconda

sbanert avatar Jan 30 '19 08:01 sbanert

@tonyreina Thanks for clarifying. I used exactly the instructions you posted to reproduce the error, and I couldn't observe it on my system.

Maybe you have a stray OpenMP library on your LD_LIBRARY_PATH that gets discovered first? If you run ldconfig -v | grep -B 5 libiomp5.so, do you get multiple hits?

kohr-h avatar Jan 30 '19 16:01 kohr-h

Anything left to do here?

kohr-h avatar Apr 06 '20 08:04 kohr-h