spark-deep-learning
spark-deep-learning copied to clipboard
Error while importing sparkdl in google colab
Here is the error call back while importing sparkdl
ModuleNotFoundError Traceback (most recent call last)
<ipython-input-3-4a9be7b8a3d0> in <module>()
----> 1 import sparkdl
1 frames
/usr/local/lib/python3.6/dist-packages/sparkdl/image/imageIO.py in <module>()
23
24 # pyspark
---> 25 from pyspark import Row
26 from pyspark import SparkContext
27 from pyspark.sql.types import (BinaryType, IntegerType, StringType, StructField, StructType)
ModuleNotFoundError: No module named 'pyspark'
Spark version -> sparkdl-0.2.2
Hey @jai-dewani this is expected behavior. Google colab's environment doesn't include all of spark's dependencies, including pyspark, hence the ModuleNotFoundError. You'll need to install these dependencies first.
This repo (https://github.com/asifahmed90/pyspark-ML-in-Colab) has an example of that, but it's a bit dated, so you might ask @asifahmed90 if you run into any issues. Good luck!
Actually I did all the necessary steps from the start yet I am ending with this problem,
Here is the link to my collab notebook https://colab.research.google.com/drive/1nYq-rv6MT78UaiQPcSaFT-PHpsgVBe7R?usp=sharing
While running the document, just run the first two subsections and you will end up with eh the same result. I am looking hard for any minor mistake I could be doing or something I missed out on, but can't seem to find something :/
Edit: A similar issue has been posted with the same problem #209 AttributeError: module 'sparkdl' has no attribute 'graph'
@jai-dewani, This setup worked for me.
!apt-get install openjdk-8-jdk-headless -qq > /dev/null
!wget -q https://downloads.apache.org/spark/spark-3.1.1/spark-3.1.1-bin-hadoop3.2.tgz
!tar xf spark-3.1.1-bin-hadoop3.2.tgz
!pip install -q findspark
import os
os.environ["JAVA_HOME"] = "/usr/lib/jvm/java-8-openjdk-amd64"
os.environ["SPARK_HOME"] = "/content/spark-3.1.1-bin-hadoop3.2"
import findspark
findspark.init()
from pyspark.sql import SparkSession
spark = SparkSession.builder.master("local[*]").getOrCreate()
I have come around to that solution by looking into latest spark package distribution page. You can do the same by checking out
https://downloads.apache.org/spark/
and look out for latest version of spark and hadoop. Ex: spark-X.X.X
/spark-X.X.X-bin-hadoopX.X.tgz
.
Change these filenames in the above code as required.