ipex-llm
ipex-llm copied to clipboard
Unable to extract native library into a temporary file for non-pip install
When running jenkins job for dien train, I fail the job when using build method to use the friesian API, I passed the job when using pip install method so there shouldnt be any problem with the code. This is the error message
Traceback (most recent call last):
File "../../example/dien/dien_train.py", line 154, in <module>
feature_cols=feature_cols, label_cols=['label'], validation_data=test_data.df)
File "/opt/work/jenkins/workspace/BigDL-PRVN-Python-Friesian-ExampleTest-Spark-2.4-feature+tf1/python/orca/src/bigdl/orca/learn/tf/estimator.py", line 584, in fit
model_dir=self.model_dir)
File "/opt/work/jenkins/workspace/BigDL-PRVN-Python-Friesian-ExampleTest-Spark-2.4-feature+tf1/python/orca/src/bigdl/orca/tfpark/tf_optimizer.py", line 496, in from_train_op
model_dir=model_dir, train_op=train_op)
File "/opt/work/jenkins/workspace/BigDL-PRVN-Python-Friesian-ExampleTest-Spark-2.4-feature+tf1/python/orca/src/bigdl/orca/tfpark/tf_optimizer.py", line 509, in _from_grads
updates, model_dir=None, train_op=train_op)
File "/opt/work/jenkins/workspace/BigDL-PRVN-Python-Friesian-ExampleTest-Spark-2.4-feature+tf1/python/orca/src/bigdl/orca/tfpark/tf_optimizer.py", line 343, in create
session_config, saver, meta, sess)
File "/opt/work/jenkins/workspace/BigDL-PRVN-Python-Friesian-ExampleTest-Spark-2.4-feature+tf1/python/orca/src/bigdl/orca/tfpark/tf_optimizer.py", line 88, in __init__
super(TFTrainingHelper, self).__init__(None, "float", path, byte_arr)
File "/opt/work/jenkins/workspace/BigDL-PRVN-Python-Friesian-ExampleTest-Spark-2.4-feature+tf1/python/dllib/src/bigdl/dllib/nn/layer.py", line 130, in __init__
bigdl_type, self.jvm_class_constructor(), *args)
File "/opt/work/jenkins/workspace/BigDL-PRVN-Python-Friesian-ExampleTest-Spark-2.4-feature+tf1/python/dllib/src/bigdl/dllib/utils/common.py", line 607, in callBigDlFunc
raise e
File "/opt/work/jenkins/workspace/BigDL-PRVN-Python-Friesian-ExampleTest-Spark-2.4-feature+tf1/python/dllib/src/bigdl/dllib/utils/common.py", line 603, in callBigDlFunc
result = callJavaFunc(api, *args)
File "/opt/work/jenkins/workspace/BigDL-PRVN-Python-Friesian-ExampleTest-Spark-2.4-feature+tf1/python/dllib/src/bigdl/dllib/utils/common.py", line 656, in callJavaFunc
result = func(*args)
File "/opt/work/spark-2.4.6/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
File "/opt/work/spark-2.4.6/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in deco
File "/opt/work/spark-2.4.6/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o41.createTFTrainingHelper.
: java.lang.UnsatisfiedLinkError: Unable to extract native library into a temporary file (java.io.IOException: Can not find resource linux-x86_64/libiomp5.so)
at com.intel.analytics.bigdl.orca.tf.TFNetNative.<clinit>(TFNetNative.java:98)
at com.intel.analytics.bigdl.orca.tfpark.TFTrainingHelper$.<init>(TFTrainingHelper.scala:313)
at com.intel.analytics.bigdl.orca.tfpark.TFTrainingHelper$.<clinit>(TFTrainingHelper.scala)
at com.intel.analytics.bigdl.orca.tfpark.python.PythonTFPark.createTFTrainingHelper(PythonTFPark.scala:71)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)
Stopping orca context
Build step 'Execute shell' marked build as failure
Finished: FAILURE
This is the job configuration
#!/bin/bash
set -e
ulimit -a
export _JAVA_OPTIONS="-XX:MaxPermSize=3G -Xmx10G"
export MAVEN_OPTS="-XX:ReservedCodeCacheSize=512m -XX:MaxPermSize=3G -Xmx10G"
export http_proxy="http://child-prc.intel.com:913/"
export https_proxy="http://child-prc.intel.com:913/"
export no_proxy="10.239.45.10:8081,10.112.231.51"
export JAVA_HOME=/opt/work/jdk8
export CLASSPATH=.:${JAVA_HOME}/lib:${JAVA_HOME}/jre/lib:${JAVA_HOME}/lib/tools.jar:${JAVA_HOME}/lib/dt.jar
export PATH=${JAVA_HOME}/bin/:${JAVA_HOME}/jre/bin:${PATH}
export PATH=/opt/work/apache-maven-3.6.3/bin:$PATH
mvn --version
export SPARK_HOME=/opt/work/spark-2.4.6
export FTP_URI=ftp://zoo:[email protected]
cd scala
./make-dist.sh -P spark_2.x -P linux -Dspark.version=2.4.6
cd -
source python/friesian/dev/prepare_env.sh
source activate py36
echo "Running py36 tests"
### example test script
### put here
pip uninstall -y bigdl
cd python/friesian/dev/test
#bash run-feature-example-tests.sh
bash run-example-tests-tf1.15.0.sh
cd ../../../../
source deactivate
source activate py37
echo "Running py37 tests"
### example test script
### put here
pip uninstall -y bigdl
cd python/friesian/dev/test
#bash run-feature-example-tests.sh
bash run-example-tests-tf1.15.0.sh
cd ../../../../
source deactivate
Should we use pip install to test examples, as this is the way how users use it?
Should we use pip install to test examples, as this is the way how users use it?
Yes, the job that use pip installed already passed, and the user is supposed to use pip to install, but I think we should also test build method.
Should we use pip install to test examples, as this is the way how users use it?
Yes, the job that use pip installed already passed, and the user is supposed to use pip to install, but I think we should also test build method.
This is a known issue, I think you can remove -P linux to workaround it.
Should we use pip install to test examples, as this is the way how users use it?
Yes, the job that use pip installed already passed, and the user is supposed to use pip to install, but I think we should also test build method.
This is a known issue, I think you can remove -P linux to workaround it. Thank You !, this solves the problem
But eventually we need to resolve this issue. For developers, we need to run Orca in non-pip mode; otherwise it is not convenient.
But eventually we need to resolve this issue. For developers, we need to run Orca in non-pip mode; otherwise it is not convenient.
What's the root cause?
But eventually we need to resolve this issue. For developers, we need to run Orca in non-pip mode; otherwise it is not convenient.
What's the root cause?
Currently the so files are downloaded from our internal storage before building the whls. So if we build by ourselves, these so files are missing unless we download manually beforehand (and external developers may have nowhere to download these so files?).
But eventually we need to resolve this issue. For developers, we need to run Orca in non-pip mode; otherwise it is not convenient.
I agree. That's why I call the it a "issue" and the solution a "workaround".
The easiest way to download the so files is to run pip install bigdl-tf bigdl-math
. However we currently using the relative path to jar file to find the so files, so we need to add the logic (like relative to $PYSPARK_PYTHON) to seach libs when jar is not installed.
Does the issue block anything? We can priotize it if it do.
Hi, @dding3 can merge the flexible path into main branch?? i also encounter the same issue since i use conda environment.
hi, @dding3 can help to merge this PR?, find that conda user can not use by the original writing.
hi, i think we have already met your need in #3664, for a developer you can follow steps in developer_guide, after you use
source dev/prepare_env.sh
to set up 'TF_LIBS_PATH' environment varible, we can locate your libs to be complied.