glow icon indicating copy to clipboard operation
glow copied to clipboard

Databricks TypeError: 'JavaPackage' object is not callable

Open helenxl opened this issue 3 years ago • 11 comments

I am running an example notebook from Databricks. I have installed glow version 1.1.1 for this cluster. I am encountering an error with glow.register(spark).

What am I missing?

import glow

import json
import numpy as np
import pandas as pd
import pyspark.sql.functions as fx

spark = glow.register(spark)
TypeError: 'JavaPackage' object is not callable
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<command-3963649488507776> in <module>
      6 import pyspark.sql.functions as fx
      7 
----> 8 spark = glow.register(spark)

/databricks/python/lib/python3.8/site-packages/glow/glow.py in register(session, new_session)
     78     sc = session._sc
     79     return SparkSession(
---> 80         sc, session._jvm.io.projectglow.Glow.register(session._jsparkSession, new_session))
     81 
     82 

TypeError: 'JavaPackage' object is not callable

helenxl avatar Dec 01 '21 17:12 helenxl

hey @helenxl this error means python cannot find the glow jars

Did you install just the pypi package? Glow also requires the jars that come from Maven coordinates. In this case, io.projectglow:glow-spark3_2.12:1.1.1

What environment are you doing this in? Is it in Databricks or another Spark service or rolling your own Spark?

williambrandler avatar Dec 01 '21 20:12 williambrandler

Thanks! I missed that requirement.

helenxl avatar Dec 01 '21 20:12 helenxl

no sweat, I forgot too first time I installed glow via pypi and maven

williambrandler avatar Dec 01 '21 21:12 williambrandler

we also have docker containers that contain all the jars and the pypi package.

https://hub.docker.com/u/projectglow

On Databricks you can install via Databricks container services, for Glow v1.1.1 you would point to this Docker Image URL

projectglow/databricks-glow:1.1.1

williambrandler avatar Dec 01 '21 21:12 williambrandler

hey @helenxl this error means python cannot find the glow jars

Did you install just the pypi package? Glow also requires the jars that come from Maven coordinates. In this case, io.projectglow:glow-spark3_2.12:1.1.1

What environment are you doing this in? Is it in Databricks or another Spark service or rolling your own Spark?

Hi! May I ask what should I do in the jupyter notebook? I come across the similar problem...

import findspark
import pyspark
import glow
from pyspark.sql import SparkSession
findspark.init()
spark = SparkSession.builder.getOrCreate()
spark = glow.register(spark)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-15-079acb31ab5e> in <module>
      1 import glow
----> 2 spark = glow.register(spark)

C:\anaconda\lib\site-packages\glow\glow.py in register(session, new_session)
     78     sc = session._sc
     79     return SparkSession(
---> 80         sc, session._jvm.io.projectglow.Glow.register(session._jsparkSession, new_session))
     81 
     82 

TypeError: 'JavaPackage' object is not callable

Lucian-jw avatar Dec 02 '21 05:12 Lucian-jw

@williambrandler When specifying projectglow/databricks-glow:1.1.1, the databricks cluster encountered an error pulling the image. I can pull the image using docker cli fine. Do you know what may be missing? Thank you.

Cluster terminated.Reason:Docker image pull failure

Cannot launch the cluster because pulling the docker image failed. Please double check connectivity from workers to the container registry, as well as the credentials used to pull the image.

Internal error message: Container setup failed due to a docker image pull failure: Image doesn't exist or invalid credential to pull image from projectglow/databricks-glow:1.1.1  .
Stdout: 
Stderr: time="2021-12-02T16:43:57Z" level=fatal msg="Error parsing image name \"docker://projectglow/databricks-glow:1.1.1  \": invalid reference format"

helenxl avatar Dec 02 '21 18:12 helenxl

@helenxl Did you get the issue resolved?

Tabinda788 avatar Mar 22 '22 12:03 Tabinda788

missed this, please share more information (such as a screenshot of cluster setup) @helenxl @Tabinda788

On Tue, Mar 22, 2022 at 5:08 AM Tabinda @.***> wrote:

@helenxl https://github.com/helenxl Did you get the issue resolved?

— Reply to this email directly, view it on GitHub https://github.com/projectglow/glow/issues/456#issuecomment-1075095832, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMGPEIZNYFUDI4EBZJKZVTLVBGZ4ZANCNFSM5JFDJLLQ . You are receiving this because you were mentioned.Message ID: @.***>

williambrandler avatar Mar 22 '22 15:03 williambrandler

Yes, please go ahead to close this issue. I was able to use projectglow in Databricks.

helenxl avatar Mar 22 '22 16:03 helenxl

@helenxl Can we make it work on local?

Tabinda788 avatar Mar 23 '22 04:03 Tabinda788

@Tabinda788 would docker work for you, @edg1983 contributed a Dockerfile for running glow outside of databricks, which we have put on the projectglow dockerhub and could be run via docker on local?

https://github.com/projectglow/glow/issues/494 https://github.com/projectglow/glow/pull/503 https://hub.docker.com/r/projectglow/open-source-glow

williambrandler avatar Mar 23 '22 16:03 williambrandler

@Tabinda788 The fix is the same locally -- you need to install the maven library.

henrydavidge avatar Mar 21 '24 09:03 henrydavidge