adam icon indicating copy to clipboard operation
adam copied to clipboard

TypeError: 'JavaPackage' object is not callable

Open Shekharrajak opened this issue 6 years ago • 13 comments

Python:

When I am trying to run ADAMContext(sparkSession) , I am getting this error:

 c = self._jvm.org.bdgenomics.adam.rdd.ADAMContext.ADAMContextFromSession(ss._jsparkSession)
TypeError: 'JavaPackage' object is not callable

The full code I am executing is :

from bdgenomics.adam.adamContext import ADAMContext
from pyspark.sql import SparkSession

  class_name = 'Spark shell'
  ss = SparkSession.builder.master('local[*]')\
  .appName(class_name)\
  .getOrCreate()
  sc = ss.sparkContext
  ac = ADAMContext(ss)

I also followed the comment: https://github.com/JohnSnowLabs/spark-nlp/issues/232#issuecomment-462825960 , but it didn't work for me.

 $ java --version
openjdk 10.0.2 2018-07-17
OpenJDK Runtime Environment (build 10.0.2+13-Ubuntu-1ubuntu0.18.04.4)
OpenJDK 64-Bit Server VM (build 10.0.2+13-Ubuntu-1ubuntu0.18.04.4, mixed mode)
sh

Shekharrajak avatar Feb 19 '19 11:02 Shekharrajak

Hello @Shekharrajak!

Thanks for submitting this issue. How did you install Spark and ADAM?

heuermh avatar Feb 19 '19 16:02 heuermh

I installed adam using this : https://adam.readthedocs.io/en/latest/installation/pip/

and Spark using this : https://medium.com/@josemarcialportilla/installing-scala-and-spark-on-ubuntu-5665ee4b62b1

Shekharrajak avatar Feb 20 '19 06:02 Shekharrajak

I cut a new ADAM version 0.26.0 release last week and pushed to PiPy. Could you give it another try with pip install bdgenomics.adam?

heuermh avatar Feb 25 '19 18:02 heuermh

I have updated the library and updated the sample code :

from bdgenomics.adam.adamContext import ADAMContext
from util import resourceFile
from pyspark.sql import SparkSession

def main():
  class_name = 'Spark shell'
  ss = SparkSession.builder.master('local[*]')\
  .appName(class_name)\
  .config("spark.driver.memory","8G")\
  .config("spark.driver.maxResultSize", "2G")\
  .config("spark.kryoserializer.buffer.max", "500m")\
  .config("spark.jars.packages", "JohnSnowLabs:spark-nlp:1.8.2")\
  .getOrCreate()
  sc = ss.sparkContext
  ac = ADAMContext(ss)

  # Load the file
  testFile = resourceFile("small.sam")
  reads = ac.loadAlignments(testFile)
  print(reads.toDF().count())

if __name__ == '__main__':
  main()

But still getting the same error :

Traceback (most recent call last):
  File "load_alignments.py", line 24, in <module>
    main()
  File "load_alignments.py", line 16, in main
    ac = ADAMContext(ss)
  File "/home/lib/python3.7/site-packages/bdgenomics/adam/adamContext.py", line 55, in __init__
    c = self._jvm.org.bdgenomics.adam.rdd.ADAMContext.ADAMContextFromSession(ss._jsparkSession)

Shekharrajak avatar Feb 26 '19 07:02 Shekharrajak

This is the details :

$  pip show bdgenomics.adam    
Name: bdgenomics.adam
Version: 0.26.0
Summary: A fast, scalable genome analysis system
Home-page: https://github.com/bdgenomics/adam
Author: Big Data Genomics
Author-email: [email protected]
License: UNKNOWN
Location: /home/lib/python3.7/site-packages
Requires: pyspark
Required-by: 

Shekharrajak avatar Feb 26 '19 07:02 Shekharrajak

@heuermh do you know if pip installs the ADAM binary? by just installing ADAM via pip, there is no way the binaries would be accessible, right?

akmorrow13 avatar Feb 26 '19 18:02 akmorrow13

do you know if pip installs the ADAM binary? by just installing ADAM via pip, there is no way the binaries would be accessible, right?

The docs "Pip will install the bdgenomics.adam Python binding, as well as the ADAM CLI." and original pull requests (https://github.com/bigdatagenomics/adam/pull/1848, https://github.com/bigdatagenomics/adam/pull/1849) read as if that were so.

I plan to fire up a new vanilla EC2 instance and give it a try this afternoon.

heuermh avatar Feb 26 '19 18:02 heuermh

Note also the example used in the jenkins script

https://github.com/bigdatagenomics/adam/blob/master/scripts/jenkins-test-pyadam.py

called here

https://github.com/bigdatagenomics/adam/blob/master/scripts/jenkins-test#L236

heuermh avatar Feb 26 '19 18:02 heuermh

Ahh fancy. @Shekharrajak how are you starting python?

akmorrow13 avatar Feb 26 '19 18:02 akmorrow13

It does look like we have a problem.

Starting with a new Amazon Linux 2 AMI on EC2, adam-submit and adam-shell appear to work fine, but pyadam fails to start

$ ssh ...

       __|  __|_  )
       _|  (     /   Amazon Linux 2 AMI
      ___|\___|___|

https://aws.amazon.com/amazon-linux-2/
1 package(s) needed for security, out of 3 available
Run "sudo yum update" to apply all updates.

$ sudo yum update
...

$ python --version
Python 2.7.14

$ sudo easy_install pip
...
Installed /usr/lib/python2.7/site-packages/pip-19.0.3-py2.7.egg

$ sudo pip install pyspark
...
Successfully installed py4j-0.10.7 pyspark-2.4.0

$ which pyspark
/usr/bin/pyspark

$ pyspark --version
JAVA_HOME is not set

$ which java
/usr/bin/which: no java in (/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/home/ec2-user/.local/bin:/home/ec2-user/bin)

$ sudo yum install java-1.8.0-openjdk-devel
...
Installed:
  java-1.8.0-openjdk-devel.x86_64 1:1.8.0.191.b12-0.amzn2

$ pyspark --version
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.4.0
      /_/

Using Scala version 2.11.12, OpenJDK 64-Bit Server VM, 1.8.0_191

$ sudo pip install bdgenomics.adam
...
Requirement already satisfied: pyspark>=1.6.0 in /usr/lib/python2.7/site-packages
(from bdgenomics.adam) (2.4.0)
Requirement already satisfied: py4j==0.10.7 in /usr/lib/python2.7/site-packages
(from pyspark>=1.6.0->bdgenomics.adam) (0.10.7)
Installing collected packages: bdgenomics.adam
  Running setup.py install for bdgenomics.adam ... done
Successfully installed bdgenomics.adam-0.26.0

$ which pyadam
/usr/bin/pyadam

$ pyadam --version
['/usr/bin/..', '/usr/lib/python2.7/site-packages/bdgenomics/adam']
['/usr/bin/..', '/usr/lib/python2.7/site-packages/bdgenomics/adam']
ls: cannot access
/usr/lib/python2.7/site-packages/bdgenomics/adam/adam-python/dist:
No such file or directory
Failed to find ADAM egg in
/usr/lib/python2.7/site-packages/bdgenomics/adam/adam-python/dist.
You need to build ADAM before running this program.

$ which adam-submit
/usr/bin/adam-submit

$ adam-submit --version
['/usr/bin/..', '/usr/lib/python2.7/site-packages/bdgenomics/adam']
Using ADAM_MAIN=org.bdgenomics.adam.cli.ADAMMain
Using spark-submit=/usr/bin/spark-submit
2019-02-26 19:21:31 INFO  ADAMMain:109 - ADAM invoked with args: "--version"

       e        888~-_         e            e    e
      d8b       888   \       d8b          d8b  d8b
     /Y88b      888    |     /Y88b        d888bdY88b
    /  Y88b     888    |    /  Y88b      / Y88Y Y888b
   /____Y88b    888   /    /____Y88b    /   YY   Y888b
  /      Y88b   888_-~    /      Y88b  /          Y888b

ADAM version: 0.26.0
Built for: Apache Spark 2.3.3, Scala 2.11.12, and Hadoop 2.7.5


$ touch empty.sam
$ adam-shell
Using SPARK_SHELL=/usr/bin/spark-shell
Spark context available as 'sc' (master = local[*], app id = local-1551208961369).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.4.0
      /_/

Using Scala version 2.11.12 (OpenJDK 64-Bit Server VM, Java 1.8.0_191)
Type in expressions to have them evaluated.
Type :help for more information.

scala> import org.bdgenomics.adam.rdd.ADAMContext._
import org.bdgenomics.adam.rdd.ADAMContext._

scala> val alignments = sc.loadAlignments("empty.sam")
alignments: org.bdgenomics.adam.rdd.read.AlignmentRecordDataset =
RDDBoundAlignmentRecordDataset with 0 reference sequences, 0 read groups,
and 0 processing steps

scala> alignments.toDF().count()
res0: Long = 0

scala> :quit

After a bit of messing around, it appears find-adam-egg.py is complaining about not finding an egg file

...
$ find-adam-assembly.sh
['/usr/bin/..', '/usr/lib/python2.7/site-packages/bdgenomics/adam']
/usr/lib/python2.7/site-packages/bdgenomics/adam/jars/adam.jar

$ find-adam-egg.sh
['/usr/bin/..', '/usr/lib/python2.7/site-packages/bdgenomics/adam']
ls: cannot access
/usr/lib/python2.7/site-packages/bdgenomics/adam/adam-python/dist:
No such file or directory
Failed to find ADAM egg in
/usr/lib/python2.7/site-packages/bdgenomics/adam/adam-python/dist.
You need to build ADAM before running this program.

If I modify the pyspark command in pyadam to not include the --py-files ${ADAM_EGG} argument, it seems to work ok

$ pyspark \
  --conf spark.serializer=org.apache.spark.serializer.KryoSerializer \
  --conf spark.kryo.registrator=org.bdgenomics.adam.serialization.ADAMKryoRegistrator \
  --jars `find-adam-assembly.sh` \
  --driver-class-path `find-adam-assembly.sh`

Python 2.7.14 (default, Jul 26 2018, 19:59:38)
[GCC 7.3.1 20180303 (Red Hat 7.3.1-5)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 2.4.0
      /_/

Using Python version 2.7.14 (default, Jul 26 2018 19:59:38)
SparkSession available as 'spark'.
>>> from bdgenomics.adam.adamContext import ADAMContext
>>> from pyspark.sql import SparkSession
>>> ss = SparkSession.builder.master('local').getOrCreate()
>>> ac = ADAMContext(ss)
>>> alignments = ac.loadAlignments("empty.sam")
>>> print(alignments.toDF().count())
0

heuermh avatar Feb 26 '19 19:02 heuermh

Ahh fancy. @Shekharrajak how are you starting python?

@akmorrow13 , I have put those lines of code into .py file and running simply using python filename.py .

@heuermh , I tried your above commands and got the similar output.

Can I do something like this :

  class_name = 'Spark shell'
  ss = SparkSession.builder.master('local[*]')\
  .appName(class_name)\
  .config("spark.driver.memory","8G")\
  .config("spark.driver.maxResultSize", "2G")\
  .config("spark.kryoserializer.buffer.max", "500m")\
  .config("spark.jars.packages", "JohnSnowLabs:spark-nlp:1.8.2")\
  .config("spark.serializer", "org.apache.spark.serializer.KryoSerializer")\
  .config("spark.kryo.registrator", 
    "org.bdgenomics.adam.serialization.ADAMKryoRegistrator")\
  .config("spark.driver.extraClassPath", "`find-adam-assembly.sh`")\
  .config("spark.jars.packages", "`find-adam-assembly.sh`")\
  .getOrCreate()

?

I got error,when I tried above code : `

Exception in thread "main" java.lang.IllegalArgumentException: requirement failed: Provided Maven Coordinates must be in the form 'groupId:artifactId:version'. The coordinate provided is: `find-adam-assembly.sh`
	at scala.Predef$.require(Predef.scala:224)
	at org.apache.spark.deploy.SparkSubmitUtils$$anonfun$extractMavenCoordinates$1.apply(SparkSubmit.scala:1018)
	at org.apache.spark.deploy.SparkSubmitUtils$$anonfun$extractMavenCoordinates$1.apply(SparkSubmit.scala:1016)
	at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
	at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
	at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
	at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
	at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
	at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)
	at org.apache.spark.deploy.SparkSubmitUtils$.extractMavenCoordinates(SparkSubmit.scala:1016)
	at org.apache.spark.deploy.SparkSubmitUtils$.resolveMavenCoordinates(SparkSubmit.scala:1264)
	at org.apache.spark.deploy.DependencyUtils$.resolveMavenDependencies(DependencyUtils.scala:54)
	at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:315)
	at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:143)
	at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
	at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:924)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Traceback (most recent call last):
  File "load_alignments.py", line 29, in <module>
    main()
  File "load_alignments.py", line 18, in main
    .config("spark.jars.packages", "`find-adam-assembly.sh`")\
  File "/home/shekharrajak/anaconda3/lib/python3.7/site-packages/pyspark/sql/session.py", line 173, in getOrCreate
    sc = SparkContext.getOrCreate(sparkConf)
  File "/home/shekharrajak/anaconda3/lib/python3.7/site-packages/pyspark/context.py", line 349, in getOrCreate
    SparkContext(conf=conf or SparkConf())
  File "/home/shekharrajak/anaconda3/lib/python3.7/site-packages/pyspark/context.py", line 115, in __init__
    SparkContext._ensure_initialized(self, gateway=gateway, conf=conf)
  File "/home/shekharrajak/anaconda3/lib/python3.7/site-packages/pyspark/context.py", line 298, in _ensure_initialized
    SparkContext._gateway = gateway or launch_gateway(conf)
  File "/home/shekharrajak/anaconda3/lib/python3.7/site-packages/pyspark/java_gateway.py", line 94, in launch_gateway
    raise Exception("Java gateway process exited before sending its port number")
Exception: Java gateway process exited before sending its port number

Shekharrajak avatar Feb 27 '19 10:02 Shekharrajak

Sorry for the delay in responding, re

.config("spark.driver.extraClassPath", "`find-adam-assembly.sh`")\

This will return the path to the ADAM assembly jar, and what Spark wants is the Maven coordinates (groupId:artifactId:version).

For version 0.26.0 of ADAM, that would be org.bdgenomics.adam:adam-assembly-spark2_2.11:0.26.0.

heuermh avatar May 23 '19 04:05 heuermh

Hello @akmorrow13, this issue looks similar to #2225 to me, in that removing the egg stuff seems to help. Curious if we should do that or if there might be another more Python-y approach to take.

heuermh avatar Jan 06 '20 18:01 heuermh