ipex-llm icon indicating copy to clipboard operation
ipex-llm copied to clipboard

Unable to extract native library into a temporary file for non-pip install

Open yizerozhuang opened this issue 2 years ago • 10 comments

When running jenkins job for dien train, I fail the job when using build method to use the friesian API, I passed the job when using pip install method so there shouldnt be any problem with the code. This is the error message

Traceback (most recent call last):
  File "../../example/dien/dien_train.py", line 154, in <module>
    feature_cols=feature_cols, label_cols=['label'], validation_data=test_data.df)
  File "/opt/work/jenkins/workspace/BigDL-PRVN-Python-Friesian-ExampleTest-Spark-2.4-feature+tf1/python/orca/src/bigdl/orca/learn/tf/estimator.py", line 584, in fit
    model_dir=self.model_dir)
  File "/opt/work/jenkins/workspace/BigDL-PRVN-Python-Friesian-ExampleTest-Spark-2.4-feature+tf1/python/orca/src/bigdl/orca/tfpark/tf_optimizer.py", line 496, in from_train_op
    model_dir=model_dir, train_op=train_op)
  File "/opt/work/jenkins/workspace/BigDL-PRVN-Python-Friesian-ExampleTest-Spark-2.4-feature+tf1/python/orca/src/bigdl/orca/tfpark/tf_optimizer.py", line 509, in _from_grads
    updates, model_dir=None, train_op=train_op)
  File "/opt/work/jenkins/workspace/BigDL-PRVN-Python-Friesian-ExampleTest-Spark-2.4-feature+tf1/python/orca/src/bigdl/orca/tfpark/tf_optimizer.py", line 343, in create
    session_config, saver, meta, sess)
  File "/opt/work/jenkins/workspace/BigDL-PRVN-Python-Friesian-ExampleTest-Spark-2.4-feature+tf1/python/orca/src/bigdl/orca/tfpark/tf_optimizer.py", line 88, in __init__
    super(TFTrainingHelper, self).__init__(None, "float", path, byte_arr)
  File "/opt/work/jenkins/workspace/BigDL-PRVN-Python-Friesian-ExampleTest-Spark-2.4-feature+tf1/python/dllib/src/bigdl/dllib/nn/layer.py", line 130, in __init__
    bigdl_type, self.jvm_class_constructor(), *args)
  File "/opt/work/jenkins/workspace/BigDL-PRVN-Python-Friesian-ExampleTest-Spark-2.4-feature+tf1/python/dllib/src/bigdl/dllib/utils/common.py", line 607, in callBigDlFunc
    raise e
  File "/opt/work/jenkins/workspace/BigDL-PRVN-Python-Friesian-ExampleTest-Spark-2.4-feature+tf1/python/dllib/src/bigdl/dllib/utils/common.py", line 603, in callBigDlFunc
    result = callJavaFunc(api, *args)
  File "/opt/work/jenkins/workspace/BigDL-PRVN-Python-Friesian-ExampleTest-Spark-2.4-feature+tf1/python/dllib/src/bigdl/dllib/utils/common.py", line 656, in callJavaFunc
    result = func(*args)
  File "/opt/work/spark-2.4.6/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
  File "/opt/work/spark-2.4.6/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in deco
    
  File "/opt/work/spark-2.4.6/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o41.createTFTrainingHelper.
: java.lang.UnsatisfiedLinkError: Unable to extract native library into a temporary file (java.io.IOException: Can not find resource linux-x86_64/libiomp5.so)
	at com.intel.analytics.bigdl.orca.tf.TFNetNative.<clinit>(TFNetNative.java:98)
	at com.intel.analytics.bigdl.orca.tfpark.TFTrainingHelper$.<init>(TFTrainingHelper.scala:313)
	at com.intel.analytics.bigdl.orca.tfpark.TFTrainingHelper$.<clinit>(TFTrainingHelper.scala)
	at com.intel.analytics.bigdl.orca.tfpark.python.PythonTFPark.createTFTrainingHelper(PythonTFPark.scala:71)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:282)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:238)
	at java.lang.Thread.run(Thread.java:748)

Stopping orca context
Build step 'Execute shell' marked build as failure
Finished: FAILURE

This is the job configuration

#!/bin/bash
set -e

ulimit -a

export _JAVA_OPTIONS="-XX:MaxPermSize=3G -Xmx10G"
export MAVEN_OPTS="-XX:ReservedCodeCacheSize=512m -XX:MaxPermSize=3G -Xmx10G"
export http_proxy="http://child-prc.intel.com:913/"
export https_proxy="http://child-prc.intel.com:913/"
export no_proxy="10.239.45.10:8081,10.112.231.51"

export JAVA_HOME=/opt/work/jdk8
export CLASSPATH=.:${JAVA_HOME}/lib:${JAVA_HOME}/jre/lib:${JAVA_HOME}/lib/tools.jar:${JAVA_HOME}/lib/dt.jar
export PATH=${JAVA_HOME}/bin/:${JAVA_HOME}/jre/bin:${PATH}
export PATH=/opt/work/apache-maven-3.6.3/bin:$PATH 
mvn --version
export SPARK_HOME=/opt/work/spark-2.4.6   


export FTP_URI=ftp://zoo:[email protected]

cd scala
./make-dist.sh -P spark_2.x -P linux -Dspark.version=2.4.6
cd -
source python/friesian/dev/prepare_env.sh
source activate py36
echo "Running py36 tests"
### example test script
### put here
pip uninstall -y bigdl
cd python/friesian/dev/test
#bash run-feature-example-tests.sh 
bash run-example-tests-tf1.15.0.sh
cd ../../../../

source deactivate

source activate py37
echo "Running py37 tests"
### example test script
### put here
pip uninstall -y bigdl
cd python/friesian/dev/test
#bash run-feature-example-tests.sh
bash run-example-tests-tf1.15.0.sh
cd ../../../../
source deactivate

yizerozhuang avatar Oct 25 '21 06:10 yizerozhuang

Should we use pip install to test examples, as this is the way how users use it?

yangw1234 avatar Oct 25 '21 07:10 yangw1234

Should we use pip install to test examples, as this is the way how users use it?

Yes, the job that use pip installed already passed, and the user is supposed to use pip to install, but I think we should also test build method.

yizerozhuang avatar Oct 25 '21 07:10 yizerozhuang

Should we use pip install to test examples, as this is the way how users use it?

Yes, the job that use pip installed already passed, and the user is supposed to use pip to install, but I think we should also test build method.

This is a known issue, I think you can remove -P linux to workaround it.

yangw1234 avatar Oct 26 '21 02:10 yangw1234

Should we use pip install to test examples, as this is the way how users use it?

Yes, the job that use pip installed already passed, and the user is supposed to use pip to install, but I think we should also test build method.

This is a known issue, I think you can remove -P linux to workaround it. Thank You !, this solves the problem

yizerozhuang avatar Oct 26 '21 07:10 yizerozhuang

But eventually we need to resolve this issue. For developers, we need to run Orca in non-pip mode; otherwise it is not convenient.

hkvision avatar Oct 26 '21 09:10 hkvision

But eventually we need to resolve this issue. For developers, we need to run Orca in non-pip mode; otherwise it is not convenient.

What's the root cause?

jason-dai avatar Oct 26 '21 09:10 jason-dai

But eventually we need to resolve this issue. For developers, we need to run Orca in non-pip mode; otherwise it is not convenient.

What's the root cause?

Currently the so files are downloaded from our internal storage before building the whls. So if we build by ourselves, these so files are missing unless we download manually beforehand (and external developers may have nowhere to download these so files?).

hkvision avatar Oct 26 '21 10:10 hkvision

But eventually we need to resolve this issue. For developers, we need to run Orca in non-pip mode; otherwise it is not convenient.

I agree. That's why I call the it a "issue" and the solution a "workaround".

The easiest way to download the so files is to run pip install bigdl-tf bigdl-math. However we currently using the relative path to jar file to find the so files, so we need to add the logic (like relative to $PYSPARK_PYTHON) to seach libs when jar is not installed.

Does the issue block anything? We can priotize it if it do.

yangw1234 avatar Oct 26 '21 13:10 yangw1234

Hi, @dding3 can merge the flexible path into main branch?? i also encounter the same issue since i use conda environment.

pohanchi avatar Jul 16 '22 12:07 pohanchi

hi, @dding3 can help to merge this PR?, find that conda user can not use by the original writing.

hi, i think we have already met your need in #3664, for a developer you can follow steps in developer_guide, after you use

source dev/prepare_env.sh

to set up 'TF_LIBS_PATH' environment varible, we can locate your libs to be complied.

leonardozcm avatar Jul 19 '22 09:07 leonardozcm