adam
adam copied to clipboard
TypeError: 'JavaPackage' object is not callable
Python:
When I am trying to run ADAMContext(sparkSession)
, I am getting this error:
c = self._jvm.org.bdgenomics.adam.rdd.ADAMContext.ADAMContextFromSession(ss._jsparkSession)
TypeError: 'JavaPackage' object is not callable
The full code I am executing is :
from bdgenomics.adam.adamContext import ADAMContext
from pyspark.sql import SparkSession
class_name = 'Spark shell'
ss = SparkSession.builder.master('local[*]')\
.appName(class_name)\
.getOrCreate()
sc = ss.sparkContext
ac = ADAMContext(ss)
I also followed the comment: https://github.com/JohnSnowLabs/spark-nlp/issues/232#issuecomment-462825960 , but it didn't work for me.
$ java --version
openjdk 10.0.2 2018-07-17
OpenJDK Runtime Environment (build 10.0.2+13-Ubuntu-1ubuntu0.18.04.4)
OpenJDK 64-Bit Server VM (build 10.0.2+13-Ubuntu-1ubuntu0.18.04.4, mixed mode)
sh
Hello @Shekharrajak!
Thanks for submitting this issue. How did you install Spark and ADAM?
I installed adam using this : https://adam.readthedocs.io/en/latest/installation/pip/
and Spark using this : https://medium.com/@josemarcialportilla/installing-scala-and-spark-on-ubuntu-5665ee4b62b1
I cut a new ADAM version 0.26.0 release last week and pushed to PiPy. Could you give it another try with pip install bdgenomics.adam
?
I have updated the library and updated the sample code :
from bdgenomics.adam.adamContext import ADAMContext
from util import resourceFile
from pyspark.sql import SparkSession
def main():
class_name = 'Spark shell'
ss = SparkSession.builder.master('local[*]')\
.appName(class_name)\
.config("spark.driver.memory","8G")\
.config("spark.driver.maxResultSize", "2G")\
.config("spark.kryoserializer.buffer.max", "500m")\
.config("spark.jars.packages", "JohnSnowLabs:spark-nlp:1.8.2")\
.getOrCreate()
sc = ss.sparkContext
ac = ADAMContext(ss)
# Load the file
testFile = resourceFile("small.sam")
reads = ac.loadAlignments(testFile)
print(reads.toDF().count())
if __name__ == '__main__':
main()
But still getting the same error :
Traceback (most recent call last):
File "load_alignments.py", line 24, in <module>
main()
File "load_alignments.py", line 16, in main
ac = ADAMContext(ss)
File "/home/lib/python3.7/site-packages/bdgenomics/adam/adamContext.py", line 55, in __init__
c = self._jvm.org.bdgenomics.adam.rdd.ADAMContext.ADAMContextFromSession(ss._jsparkSession)
This is the details :
$ pip show bdgenomics.adam
Name: bdgenomics.adam
Version: 0.26.0
Summary: A fast, scalable genome analysis system
Home-page: https://github.com/bdgenomics/adam
Author: Big Data Genomics
Author-email: [email protected]
License: UNKNOWN
Location: /home/lib/python3.7/site-packages
Requires: pyspark
Required-by:
@heuermh do you know if pip installs the ADAM binary? by just installing ADAM via pip, there is no way the binaries would be accessible, right?
do you know if pip installs the ADAM binary? by just installing ADAM via pip, there is no way the binaries would be accessible, right?
The docs "Pip will install the bdgenomics.adam Python binding, as well as the ADAM CLI." and original pull requests (https://github.com/bigdatagenomics/adam/pull/1848, https://github.com/bigdatagenomics/adam/pull/1849) read as if that were so.
I plan to fire up a new vanilla EC2 instance and give it a try this afternoon.
Note also the example used in the jenkins script
https://github.com/bigdatagenomics/adam/blob/master/scripts/jenkins-test-pyadam.py
called here
https://github.com/bigdatagenomics/adam/blob/master/scripts/jenkins-test#L236
Ahh fancy. @Shekharrajak how are you starting python?
It does look like we have a problem.
Starting with a new Amazon Linux 2 AMI on EC2, adam-submit
and adam-shell
appear to work fine, but pyadam
fails to start
$ ssh ...
__| __|_ )
_| ( / Amazon Linux 2 AMI
___|\___|___|
https://aws.amazon.com/amazon-linux-2/
1 package(s) needed for security, out of 3 available
Run "sudo yum update" to apply all updates.
$ sudo yum update
...
$ python --version
Python 2.7.14
$ sudo easy_install pip
...
Installed /usr/lib/python2.7/site-packages/pip-19.0.3-py2.7.egg
$ sudo pip install pyspark
...
Successfully installed py4j-0.10.7 pyspark-2.4.0
$ which pyspark
/usr/bin/pyspark
$ pyspark --version
JAVA_HOME is not set
$ which java
/usr/bin/which: no java in (/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/home/ec2-user/.local/bin:/home/ec2-user/bin)
$ sudo yum install java-1.8.0-openjdk-devel
...
Installed:
java-1.8.0-openjdk-devel.x86_64 1:1.8.0.191.b12-0.amzn2
$ pyspark --version
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.4.0
/_/
Using Scala version 2.11.12, OpenJDK 64-Bit Server VM, 1.8.0_191
$ sudo pip install bdgenomics.adam
...
Requirement already satisfied: pyspark>=1.6.0 in /usr/lib/python2.7/site-packages
(from bdgenomics.adam) (2.4.0)
Requirement already satisfied: py4j==0.10.7 in /usr/lib/python2.7/site-packages
(from pyspark>=1.6.0->bdgenomics.adam) (0.10.7)
Installing collected packages: bdgenomics.adam
Running setup.py install for bdgenomics.adam ... done
Successfully installed bdgenomics.adam-0.26.0
$ which pyadam
/usr/bin/pyadam
$ pyadam --version
['/usr/bin/..', '/usr/lib/python2.7/site-packages/bdgenomics/adam']
['/usr/bin/..', '/usr/lib/python2.7/site-packages/bdgenomics/adam']
ls: cannot access
/usr/lib/python2.7/site-packages/bdgenomics/adam/adam-python/dist:
No such file or directory
Failed to find ADAM egg in
/usr/lib/python2.7/site-packages/bdgenomics/adam/adam-python/dist.
You need to build ADAM before running this program.
$ which adam-submit
/usr/bin/adam-submit
$ adam-submit --version
['/usr/bin/..', '/usr/lib/python2.7/site-packages/bdgenomics/adam']
Using ADAM_MAIN=org.bdgenomics.adam.cli.ADAMMain
Using spark-submit=/usr/bin/spark-submit
2019-02-26 19:21:31 INFO ADAMMain:109 - ADAM invoked with args: "--version"
e 888~-_ e e e
d8b 888 \ d8b d8b d8b
/Y88b 888 | /Y88b d888bdY88b
/ Y88b 888 | / Y88b / Y88Y Y888b
/____Y88b 888 / /____Y88b / YY Y888b
/ Y88b 888_-~ / Y88b / Y888b
ADAM version: 0.26.0
Built for: Apache Spark 2.3.3, Scala 2.11.12, and Hadoop 2.7.5
$ touch empty.sam
$ adam-shell
Using SPARK_SHELL=/usr/bin/spark-shell
Spark context available as 'sc' (master = local[*], app id = local-1551208961369).
Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.4.0
/_/
Using Scala version 2.11.12 (OpenJDK 64-Bit Server VM, Java 1.8.0_191)
Type in expressions to have them evaluated.
Type :help for more information.
scala> import org.bdgenomics.adam.rdd.ADAMContext._
import org.bdgenomics.adam.rdd.ADAMContext._
scala> val alignments = sc.loadAlignments("empty.sam")
alignments: org.bdgenomics.adam.rdd.read.AlignmentRecordDataset =
RDDBoundAlignmentRecordDataset with 0 reference sequences, 0 read groups,
and 0 processing steps
scala> alignments.toDF().count()
res0: Long = 0
scala> :quit
After a bit of messing around, it appears find-adam-egg.py
is complaining about not finding an egg file
...
$ find-adam-assembly.sh
['/usr/bin/..', '/usr/lib/python2.7/site-packages/bdgenomics/adam']
/usr/lib/python2.7/site-packages/bdgenomics/adam/jars/adam.jar
$ find-adam-egg.sh
['/usr/bin/..', '/usr/lib/python2.7/site-packages/bdgenomics/adam']
ls: cannot access
/usr/lib/python2.7/site-packages/bdgenomics/adam/adam-python/dist:
No such file or directory
Failed to find ADAM egg in
/usr/lib/python2.7/site-packages/bdgenomics/adam/adam-python/dist.
You need to build ADAM before running this program.
If I modify the pyspark
command in pyadam
to not include the --py-files ${ADAM_EGG}
argument, it seems to work ok
$ pyspark \
--conf spark.serializer=org.apache.spark.serializer.KryoSerializer \
--conf spark.kryo.registrator=org.bdgenomics.adam.serialization.ADAMKryoRegistrator \
--jars `find-adam-assembly.sh` \
--driver-class-path `find-adam-assembly.sh`
Python 2.7.14 (default, Jul 26 2018, 19:59:38)
[GCC 7.3.1 20180303 (Red Hat 7.3.1-5)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 2.4.0
/_/
Using Python version 2.7.14 (default, Jul 26 2018 19:59:38)
SparkSession available as 'spark'.
>>> from bdgenomics.adam.adamContext import ADAMContext
>>> from pyspark.sql import SparkSession
>>> ss = SparkSession.builder.master('local').getOrCreate()
>>> ac = ADAMContext(ss)
>>> alignments = ac.loadAlignments("empty.sam")
>>> print(alignments.toDF().count())
0
Ahh fancy. @Shekharrajak how are you starting python?
@akmorrow13 , I have put those lines of code into .py
file and running simply using python filename.py
.
@heuermh , I tried your above commands and got the similar output.
Can I do something like this :
class_name = 'Spark shell'
ss = SparkSession.builder.master('local[*]')\
.appName(class_name)\
.config("spark.driver.memory","8G")\
.config("spark.driver.maxResultSize", "2G")\
.config("spark.kryoserializer.buffer.max", "500m")\
.config("spark.jars.packages", "JohnSnowLabs:spark-nlp:1.8.2")\
.config("spark.serializer", "org.apache.spark.serializer.KryoSerializer")\
.config("spark.kryo.registrator",
"org.bdgenomics.adam.serialization.ADAMKryoRegistrator")\
.config("spark.driver.extraClassPath", "`find-adam-assembly.sh`")\
.config("spark.jars.packages", "`find-adam-assembly.sh`")\
.getOrCreate()
?
I got error,when I tried above code : `
Exception in thread "main" java.lang.IllegalArgumentException: requirement failed: Provided Maven Coordinates must be in the form 'groupId:artifactId:version'. The coordinate provided is: `find-adam-assembly.sh`
at scala.Predef$.require(Predef.scala:224)
at org.apache.spark.deploy.SparkSubmitUtils$$anonfun$extractMavenCoordinates$1.apply(SparkSubmit.scala:1018)
at org.apache.spark.deploy.SparkSubmitUtils$$anonfun$extractMavenCoordinates$1.apply(SparkSubmit.scala:1016)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)
at org.apache.spark.deploy.SparkSubmitUtils$.extractMavenCoordinates(SparkSubmit.scala:1016)
at org.apache.spark.deploy.SparkSubmitUtils$.resolveMavenCoordinates(SparkSubmit.scala:1264)
at org.apache.spark.deploy.DependencyUtils$.resolveMavenDependencies(DependencyUtils.scala:54)
at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:315)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:143)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:924)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Traceback (most recent call last):
File "load_alignments.py", line 29, in <module>
main()
File "load_alignments.py", line 18, in main
.config("spark.jars.packages", "`find-adam-assembly.sh`")\
File "/home/shekharrajak/anaconda3/lib/python3.7/site-packages/pyspark/sql/session.py", line 173, in getOrCreate
sc = SparkContext.getOrCreate(sparkConf)
File "/home/shekharrajak/anaconda3/lib/python3.7/site-packages/pyspark/context.py", line 349, in getOrCreate
SparkContext(conf=conf or SparkConf())
File "/home/shekharrajak/anaconda3/lib/python3.7/site-packages/pyspark/context.py", line 115, in __init__
SparkContext._ensure_initialized(self, gateway=gateway, conf=conf)
File "/home/shekharrajak/anaconda3/lib/python3.7/site-packages/pyspark/context.py", line 298, in _ensure_initialized
SparkContext._gateway = gateway or launch_gateway(conf)
File "/home/shekharrajak/anaconda3/lib/python3.7/site-packages/pyspark/java_gateway.py", line 94, in launch_gateway
raise Exception("Java gateway process exited before sending its port number")
Exception: Java gateway process exited before sending its port number
Sorry for the delay in responding, re
.config("spark.driver.extraClassPath", "`find-adam-assembly.sh`")\
This will return the path to the ADAM assembly jar, and what Spark wants is the Maven coordinates (groupId:artifactId:version
).
For version 0.26.0 of ADAM, that would be org.bdgenomics.adam:adam-assembly-spark2_2.11:0.26.0
.
Hello @akmorrow13, this issue looks similar to #2225 to me, in that removing the egg stuff seems to help. Curious if we should do that or if there might be another more Python-y approach to take.