elephas
elephas copied to clipboard
Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe.
Hi, I'm trying to use elephas for my deep learning models on spark but so far I couldn't even get anything to work on 3 different machines and on multiple notebooks.
-
"ml_pipeline_otto.py" crashes on the
load_data_frame
function, more specifically onreturn sqlContext.createDataFrame(data, ['features', 'category'])
with the error :Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.runJob.
-
"mnist_mlp_spark.py" crashes on the
spark_model.fit
method with the error :TypeError: can't pickle _thread.RLock objects
-
"My Own Pipeline" crashes right after fitting (it actually trains it) the model with this error :
Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe.
I'm running tensorflow 2.1.0, pyspark 3.0.2, jdk-8u281 and python 3.7 and elephas 1.4.2
I unfortunately can't replicate with all versions mentioned :( could this be related to your notebook environment?
Hi @danielenricocahall thanks for your time. I tried both on conda and pycharm with venv. Just to be absolutely clear I'm doing things right :
- I create a new environment for python 3.7, run pip install elephas. It will automatically install all dependencies.
Maybe it is also important to refer that I'm running this on windows 10. Do I need to install anything else? Setup any other environment variables? Like SPARK_HOME, JAVA_HOME and wtv? (I did that but I'm not sure they are really needed for this use case)
Those sound like the right steps, and there should be no additional configuration required. There may be an issue on Windows - I have only tested on Linux. There is another open issue where the user was on a Windows machine: https://github.com/maxpumperla/elephas/issues/142, and I believe I have talked with one or two others who encountered issues on Windows. I'm sorry. :(
@danielenricocahall to add a little bit more on this topic. Installed it on a fresh VM with ubuntu. Installed conda and created a new virtual environment with python 3.7. Ran pip install elephas
. Tried the mllib_mlp.py
example and it gave me an error about java. Installed java sudo apt-get install openjdk-8-jdk-headless -qq
. After than re-ran the notebook and complained about JAVA_HOME . Added this to my ~/.bashrc :
export JAVA_HOME="/usr/lib/jvm/java-8-openjdk-amd64/jre/"
export PATH=$PATH:$JAVA_HOME/bin/
Now it gets to the fit function and hangs there with this error: py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe.
Exactly like before... So no.. i dont think this is a windows only problem. Something else is going on. I also did a "pip freeze" so you can have a look at the package versions it installed :
absl-py==0.11.0
anyio==2.1.0
argon2-cffi==20.1.0
astor==0.8.1
async-generator==1.10
attrs==20.3.0
autoflake==1.4
Babel==2.9.0
backcall==0.2.0
bleach==3.3.0
cachetools==4.2.1
certifi==2020.12.5
cffi==1.14.5
chardet==4.0.0
click==7.1.2
cloudpickle==1.6.0
cycler==0.10.0
Cython==0.29.22
decorator==4.4.2
defusedxml==0.6.0
elephas==1.4.2
entrypoints==0.3
Flask==1.1.2
future==0.18.2
gast==0.2.2
google-auth==1.27.0
google-auth-oauthlib==0.4.2
google-pasta==0.2.0
grpcio==1.35.0
h5py==2.10.0
hyperas==0.4.1
hyperopt==0.2.5
idna==2.10
importlib-metadata==3.7.0
ipykernel==5.5.0
ipython==7.20.0
ipython-genutils==0.2.0
ipywidgets==7.6.3
itsdangerous==1.1.0
jedi==0.18.0
Jinja2==2.11.3
joblib==1.0.1
json5==0.9.5
jsonschema==3.2.0
jupyter==1.0.0
jupyter-client==6.1.11
jupyter-console==6.2.0
jupyter-core==4.7.1
jupyter-packaging==0.7.12
jupyter-server==1.4.1
jupyterlab==3.0.9
jupyterlab-pygments==0.1.2
jupyterlab-server==2.3.0
jupyterlab-widgets==1.0.0
Keras==2.2.5
Keras-Applications==1.0.8
Keras-Preprocessing==1.1.0
kiwisolver==1.3.1
Markdown==3.3.4
MarkupSafe==1.1.1
matplotlib==3.3.4
mistune==0.8.4
nbclassic==0.2.6
nbclient==0.5.2
nbconvert==6.0.7
nbformat==5.1.2
nest-asyncio==1.5.1
networkx==2.5
notebook==6.2.0
numpy==1.18.5
oauthlib==3.1.0
opt-einsum==3.3.0
packaging==20.9
pandas==1.2.2
pandocfilters==1.4.3
parso==0.8.1
pexpect==4.8.0
pickleshare==0.7.5
Pillow==8.1.0
prometheus-client==0.9.0
prompt-toolkit==3.0.16
protobuf==3.15.2
ptyprocess==0.7.0
py4j==0.10.9
pyasn1==0.4.8
pyasn1-modules==0.2.8
pycparser==2.20
pyflakes==2.2.0
Pygments==2.8.0
pyparsing==2.4.7
pyrsistent==0.17.3
pyspark==3.0.2
python-dateutil==2.8.1
pytz==2021.1
PyYAML==5.4.1
pyzmq==22.0.3
qtconsole==5.0.2
QtPy==1.9.0
requests==2.25.1
requests-oauthlib==1.3.0
rsa==4.7.2
scikit-learn==0.24.1
scipy==1.6.1
seaborn==0.11.1
Send2Trash==1.5.0
six==1.15.0
sniffio==1.2.0
tensorboard==2.1.1
tensorflow==2.1.3
tensorflow-estimator==2.1.0
termcolor==1.1.0
terminado==0.9.2
testpath==0.4.4
threadpoolctl==2.1.0
tornado==6.1
tqdm==4.57.0
traitlets==5.0.5
typing-extensions==3.7.4.3
urllib3==1.26.3
wcwidth==0.2.5
webencodings==0.5.1
Werkzeug==1.0.1
widgetsnbextension==3.5.1
wrapt==1.12.1
zipp==3.4.0
Edit: Complete traceback of the error : https://pastebin.com/dufuX7F3
Edit2: Yet another update . Setting JAVA_PATH on ~/.bashrc made everything work. The same procedure on windows leads me to TypeError: can't pickle _thread.RLock objects
.
I'm totally out of ideas.
Reviewing the traceback:
21/02/25 10:44:11 ERROR Executor: Exception in task 1.0 in stage 0.0 (TID 1)
java.lang.OutOfMemoryError: Java heap space
21/02/25 10:44:11 ERROR Executor: Exception in task 2.0 in stage 0.0 (TID 2)
java.lang.OutOfMemoryError: Java heap space
You may need increase spark.driver.memory
in your spark config. How much memory do you have available?
I do have 32GB of ram available and I set the driver to 32GB as well. The same script under Ubuntu works just fine .
Hi there! Had the same issue, but this solution helped: import findspark findspark.init() Initialize it before the creation of spark session
it happens becuse you are doing some illegal type casting
Hi,
Thanks had same issue its been resolved. import findspark findspark.init() Initialize it before the creation of spark session
Note: Windows seems has other dependencies, Not sure what was the issue but its fixed now. please pass it on detail like how this package help to resolve this.
Hi,
Thanks had same issue its been resolved. import findspark findspark.init() Initialize it before the creation of spark session
Note: Windows seems has other dependencies, Not sure what was the issue but its fixed now. please pass it on detail like how this package help to resolve this.
Hi Mayank,
Thanks for your comments. 'findspark' package helped me to solve the issue.
findspark.init()
bro you save my life, couldn't thank more sir
Closing this issue for now, but please let me know if other issues arise on the new fork (https://github.com/danielenricocahall/elephas)
Hi there! Had the same issue, but this solution helped: import findspark findspark.init() Initialize it before the creation of spark session
This solved my issue. Don't forget to restart kernel and re-run cells after installing findspark
Hi there! Had the same issue, but this solution helped: import findspark findspark.init() Initialize it before the creation of spark session
This solved my issue. Don't forget to restart kernel and re-run cells after installing
findspark
Yes, it works for me. Especially, don't forget to restart kernel before findspark.init()
Hi there! Had the same issue, but this solution helped: import findspark findspark.init() Initialize it before the creation of spark session
amazing,guys,thx
Thanks man! It helped.