elephas icon indicating copy to clipboard operation
elephas copied to clipboard

Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe.

Open diogoribeiro09 opened this issue 4 years ago • 11 comments

Hi, I'm trying to use elephas for my deep learning models on spark but so far I couldn't even get anything to work on 3 different machines and on multiple notebooks.

  • "ml_pipeline_otto.py" crashes on the load_data_frame function, more specifically on return sqlContext.createDataFrame(data, ['features', 'category']) with the error : Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.runJob.

  • "mnist_mlp_spark.py" crashes on the spark_model.fit method with the error : TypeError: can't pickle _thread.RLock objects

  • "My Own Pipeline" crashes right after fitting (it actually trains it) the model with this error : Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe.

I'm running tensorflow 2.1.0, pyspark 3.0.2, jdk-8u281 and python 3.7 and elephas 1.4.2

diogoribeiro09 avatar Feb 24 '21 12:02 diogoribeiro09

I unfortunately can't replicate with all versions mentioned :( could this be related to your notebook environment?

danielenricocahall avatar Feb 25 '21 10:02 danielenricocahall

Hi @danielenricocahall thanks for your time. I tried both on conda and pycharm with venv. Just to be absolutely clear I'm doing things right :

  • I create a new environment for python 3.7, run pip install elephas. It will automatically install all dependencies.

Maybe it is also important to refer that I'm running this on windows 10. Do I need to install anything else? Setup any other environment variables? Like SPARK_HOME, JAVA_HOME and wtv? (I did that but I'm not sure they are really needed for this use case)

diogoribeiro09 avatar Feb 25 '21 10:02 diogoribeiro09

Those sound like the right steps, and there should be no additional configuration required. There may be an issue on Windows - I have only tested on Linux. There is another open issue where the user was on a Windows machine: https://github.com/maxpumperla/elephas/issues/142, and I believe I have talked with one or two others who encountered issues on Windows. I'm sorry. :(

danielenricocahall avatar Feb 25 '21 11:02 danielenricocahall

@danielenricocahall to add a little bit more on this topic. Installed it on a fresh VM with ubuntu. Installed conda and created a new virtual environment with python 3.7. Ran pip install elephas. Tried the mllib_mlp.py example and it gave me an error about java. Installed java sudo apt-get install openjdk-8-jdk-headless -qq . After than re-ran the notebook and complained about JAVA_HOME . Added this to my ~/.bashrc :

export JAVA_HOME="/usr/lib/jvm/java-8-openjdk-amd64/jre/"
export PATH=$PATH:$JAVA_HOME/bin/

Now it gets to the fit function and hangs there with this error: py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe. Exactly like before... So no.. i dont think this is a windows only problem. Something else is going on. I also did a "pip freeze" so you can have a look at the package versions it installed :

absl-py==0.11.0
anyio==2.1.0
argon2-cffi==20.1.0
astor==0.8.1
async-generator==1.10
attrs==20.3.0
autoflake==1.4
Babel==2.9.0
backcall==0.2.0
bleach==3.3.0
cachetools==4.2.1
certifi==2020.12.5
cffi==1.14.5
chardet==4.0.0
click==7.1.2
cloudpickle==1.6.0
cycler==0.10.0
Cython==0.29.22
decorator==4.4.2
defusedxml==0.6.0
elephas==1.4.2
entrypoints==0.3
Flask==1.1.2
future==0.18.2
gast==0.2.2
google-auth==1.27.0
google-auth-oauthlib==0.4.2
google-pasta==0.2.0
grpcio==1.35.0
h5py==2.10.0
hyperas==0.4.1
hyperopt==0.2.5
idna==2.10
importlib-metadata==3.7.0
ipykernel==5.5.0
ipython==7.20.0
ipython-genutils==0.2.0
ipywidgets==7.6.3
itsdangerous==1.1.0
jedi==0.18.0
Jinja2==2.11.3
joblib==1.0.1
json5==0.9.5
jsonschema==3.2.0
jupyter==1.0.0
jupyter-client==6.1.11
jupyter-console==6.2.0
jupyter-core==4.7.1
jupyter-packaging==0.7.12
jupyter-server==1.4.1
jupyterlab==3.0.9
jupyterlab-pygments==0.1.2
jupyterlab-server==2.3.0
jupyterlab-widgets==1.0.0
Keras==2.2.5
Keras-Applications==1.0.8
Keras-Preprocessing==1.1.0
kiwisolver==1.3.1
Markdown==3.3.4
MarkupSafe==1.1.1
matplotlib==3.3.4
mistune==0.8.4
nbclassic==0.2.6
nbclient==0.5.2
nbconvert==6.0.7
nbformat==5.1.2
nest-asyncio==1.5.1
networkx==2.5
notebook==6.2.0
numpy==1.18.5
oauthlib==3.1.0
opt-einsum==3.3.0
packaging==20.9
pandas==1.2.2
pandocfilters==1.4.3
parso==0.8.1
pexpect==4.8.0
pickleshare==0.7.5
Pillow==8.1.0
prometheus-client==0.9.0
prompt-toolkit==3.0.16
protobuf==3.15.2
ptyprocess==0.7.0
py4j==0.10.9
pyasn1==0.4.8
pyasn1-modules==0.2.8
pycparser==2.20
pyflakes==2.2.0
Pygments==2.8.0
pyparsing==2.4.7
pyrsistent==0.17.3
pyspark==3.0.2
python-dateutil==2.8.1
pytz==2021.1
PyYAML==5.4.1
pyzmq==22.0.3
qtconsole==5.0.2
QtPy==1.9.0
requests==2.25.1
requests-oauthlib==1.3.0
rsa==4.7.2
scikit-learn==0.24.1
scipy==1.6.1
seaborn==0.11.1
Send2Trash==1.5.0
six==1.15.0
sniffio==1.2.0
tensorboard==2.1.1
tensorflow==2.1.3
tensorflow-estimator==2.1.0
termcolor==1.1.0
terminado==0.9.2
testpath==0.4.4
threadpoolctl==2.1.0
tornado==6.1
tqdm==4.57.0
traitlets==5.0.5
typing-extensions==3.7.4.3
urllib3==1.26.3
wcwidth==0.2.5
webencodings==0.5.1
Werkzeug==1.0.1
widgetsnbextension==3.5.1
wrapt==1.12.1
zipp==3.4.0

Edit: Complete traceback of the error : https://pastebin.com/dufuX7F3 Edit2: Yet another update . Setting JAVA_PATH on ~/.bashrc made everything work. The same procedure on windows leads me to TypeError: can't pickle _thread.RLock objects. I'm totally out of ideas.

diogoribeiro09 avatar Feb 25 '21 15:02 diogoribeiro09

Reviewing the traceback:

21/02/25 10:44:11 ERROR Executor: Exception in task 1.0 in stage 0.0 (TID 1)
java.lang.OutOfMemoryError: Java heap space
21/02/25 10:44:11 ERROR Executor: Exception in task 2.0 in stage 0.0 (TID 2)
java.lang.OutOfMemoryError: Java heap space

You may need increase spark.driver.memory in your spark config. How much memory do you have available?

danielenricocahall avatar Mar 28 '21 19:03 danielenricocahall

I do have 32GB of ram available and I set the driver to 32GB as well. The same script under Ubuntu works just fine .

diogoribeiro09 avatar Mar 28 '21 19:03 diogoribeiro09

Hi there! Had the same issue, but this solution helped: import findspark findspark.init() Initialize it before the creation of spark session

nboyarkin avatar Jun 21 '22 14:06 nboyarkin

it happens becuse you are doing some illegal type casting

Washim1997 avatar Jun 22 '22 09:06 Washim1997

Hi,

Thanks had same issue its been resolved. import findspark findspark.init() Initialize it before the creation of spark session

Note: Windows seems has other dependencies, Not sure what was the issue but its fixed now. please pass it on detail like how this package help to resolve this.

Mayank01 avatar Jul 02 '22 15:07 Mayank01

Hi,

Thanks had same issue its been resolved. import findspark findspark.init() Initialize it before the creation of spark session

Note: Windows seems has other dependencies, Not sure what was the issue but its fixed now. please pass it on detail like how this package help to resolve this.

Hi Mayank,

Thanks for your comments. 'findspark' package helped me to solve the issue.

GaneshJalakam avatar Aug 15 '22 16:08 GaneshJalakam

findspark.init()

bro you save my life, couldn't thank more sir

rabzgg avatar Sep 15 '22 15:09 rabzgg

Closing this issue for now, but please let me know if other issues arise on the new fork (https://github.com/danielenricocahall/elephas)

danielenricocahall avatar Oct 11 '22 13:10 danielenricocahall

Hi there! Had the same issue, but this solution helped: import findspark findspark.init() Initialize it before the creation of spark session

This solved my issue. Don't forget to restart kernel and re-run cells after installing findspark

dendihandian avatar Oct 12 '22 08:10 dendihandian

Hi there! Had the same issue, but this solution helped: import findspark findspark.init() Initialize it before the creation of spark session

This solved my issue. Don't forget to restart kernel and re-run cells after installing findspark

Yes, it works for me. Especially, don't forget to restart kernel before findspark.init()

RyanXu11 avatar Feb 16 '23 15:02 RyanXu11

Hi there! Had the same issue, but this solution helped: import findspark findspark.init() Initialize it before the creation of spark session

amazing,guys,thx

littledgg avatar Nov 26 '23 14:11 littledgg

Thanks man! It helped.

Hridoy-bit avatar Mar 07 '24 01:03 Hridoy-bit