sparkmagic
sparkmagic copied to clipboard
[BUG] Default Docker container got broken
Describe the bug I believe there was a new push of the image by datamechanics (5 days ago ?) and now sparkmagic docker image does not work anymore. If you log to the spark-1 container, and try ../bin/pyspark I get this error:
Caused by: java.lang.IllegalArgumentException: Unrecognized Hadoop major version number: 3.1.0
at org.apache.hadoop.hive.shims.ShimLoader.getMajorVersion(ShimLoader.java:174)
at org.apache.hadoop.hive.shims.ShimLoader.loadShims(ShimLoader.java:139)
at org.apache.hadoop.hive.shims.ShimLoader.getHadoopShims(ShimLoader.java:100)
at org.apache.hadoop.hive.conf.HiveConf$ConfVars.<clinit>(HiveConf.java:368)
To Reproduce git clone https://github.com/jupyter-incubator/sparkmagic sparkmagic-dev cd sparkmagic-dev docker compose up
then create a new PySpark notebook and a simple command does not. work. eg. %data = [(1, 'John', 'Doe')]
The code failed because of a fatal error:
Error sending http request and maximum retry encountered..
Some things to try:
a) Make sure Spark has enough available resources for Jupyter to create a Spark context.
b) Contact your Jupyter administrator to make sure the Spark magics library is configured correctly.
c) Restart the kernel.
Expected behavior PySpark kernel should work
Screenshots If applicable, add screenshots to help explain your problem.
Versions:
- SparkMagic 20.4 or master
- Livy (if you know it)
- Spark 2.4.7
Additional context I believe there was a new push of the image by datamechanics (5 days ago ?)
Hi @ltregan thanks for opening an issue. I'm looking into this today
@ltregan I'm unable to reproduce. Can you check if you are still running into issues?
Still same issue, even after clearing the cache. Exact sequence is:
$ docker system prune -a -f
$ git clone https://github.com/jupyter-incubator/sparkmagic sparkmagic-dev
$ cd sparkmagic-dev
$ docker compose up
I am on Mac M1. Something fishy also is that CPU start at 20% (can be seen in the screenshots at the bottom) then goes up to 40% after a couple of minutes and stay there.
Full log then screenshots below.
sh-5.1# ../bin/pyspark
Python 3.7.11 (default, Jul 27 2021, 14:32:16)
[GCC 7.5.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
23/03/15 18:45:08 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Exception in thread "Thread-4" java.lang.ExceptionInInitializerError
at org.apache.hadoop.hive.conf.HiveConf.<clinit>(HiveConf.java:105)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at py4j.reflection.CurrentThreadClassLoadingStrategy.classForName(CurrentThreadClassLoadingStrategy.java:40)
at py4j.reflection.ReflectionUtil.classForName(ReflectionUtil.java:51)
at py4j.reflection.TypeUtil.forName(TypeUtil.java:243)
at py4j.commands.ReflectionCommand.getUnknownMember(ReflectionCommand.java:175)
at py4j.commands.ReflectionCommand.execute(ReflectionCommand.java:87)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:750)
Caused by: java.lang.IllegalArgumentException: Unrecognized Hadoop major version number: 3.1.0
at org.apache.hadoop.hive.shims.ShimLoader.getMajorVersion(ShimLoader.java:174)
at org.apache.hadoop.hive.shims.ShimLoader.loadShims(ShimLoader.java:139)
at org.apache.hadoop.hive.shims.ShimLoader.getHadoopShims(ShimLoader.java:100)
at org.apache.hadoop.hive.conf.HiveConf$ConfVars.<clinit>(HiveConf.java:368)
... 10 more
ERROR:root:Exception while sending command.
Traceback (most recent call last):
File "/opt/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1159, in send_command
raise Py4JNetworkError("Answer from Java side is empty")
py4j.protocol.Py4JNetworkError: Answer from Java side is empty
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/opt/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 985, in send_command
response = connection.send_command(command)
File "/opt/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1164, in send_command
"Error while receiving", e, proto.ERROR_ON_RECEIVE)
py4j.protocol.Py4JNetworkError: Error while receiving
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 2.4.7
/_/
Using Python version 3.7.11 (default, Jul 27 2021 14:32:16)
SparkSession available as 'spark'.
>>>
@ltregan Thanks for the screenshots. I'm able to reproduce