I am using mango-distribution-0.0.1 on Hadoop 2.6.0-cdh5.14.4/spark version 2.2.0/Scala version 2.11.8
Mango submit fails with parquet class not found. I tried to pass parquet class in CLI but it not helping as shown by messages below. I also included how data files layout on HDFS.
[sm@bluedata750 bin]$ ./mango-submit --packages org.apache.parquet:parquet-hadoop:1.8.2 /user/sm/hg19.17.2bit -genes /user/sm/ensGene.bb -reads /user/sm/chr17.7500000-7515000.sam.adam -variants /user/sm/chr17.adam -show_genotypes -discover Using spark-submit=/usr/bin/spark2-submit
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/parquet/hadoop/metadata/CompressionCodecName
at org.bdgenomics.utils.cli.ParquetArgs$class.$init$(ParquetArgs.scala:40)
at org.bdgenomics.mango.cli.VizReadsArgs.(VizReads.scala:252)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at java.lang.Class.newInstance(Class.java:442)
at org.bdgenomics.utils.cli.Args4j$.apply(Args4j.scala:34)
at org.bdgenomics.mango.cli.VizReads$.apply(VizReads.scala:196)
at org.bdgenomics.utils.cli.BDGCommandCompanion$class.main(BDGCommand.scala:33)
at org.bdgenomics.mango.cli.VizReads$.main(VizReads.scala:125)
at org.bdgenomics.mango.cli.VizReads.main(VizReads.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:755)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.parquet.hadoop.metadata.CompressionCodecName
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 21 more
[sm@bluedata750 bin]$ ./mango-submit /user/sm/hg19.17.2bit -genes /user/sm/ensGene.bb -reads /user/sm/chr17.7500000-7515000.sam.adam -variants /user/sm/chr17.adam -show_genotypes -discover
Using spark-submit=/usr/bin/spark2-submit
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/parquet/hadoop/metadata/CompressionCodecName
at org.bdgenomics.utils.cli.ParquetArgs$class.$init$(ParquetArgs.scala:40)
at org.bdgenomics.mango.cli.VizReadsArgs.(VizReads.scala:252)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at java.lang.Class.newInstance(Class.java:442)
at org.bdgenomics.utils.cli.Args4j$.apply(Args4j.scala:34)
at org.bdgenomics.mango.cli.VizReads$.apply(VizReads.scala:196)
at org.bdgenomics.utils.cli.BDGCommandCompanion$class.main(BDGCommand.scala:33)
at org.bdgenomics.mango.cli.VizReads$.main(VizReads.scala:125)
at org.bdgenomics.mango.cli.VizReads.main(VizReads.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:755)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.parquet.hadoop.metadata.CompressionCodecName
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 21 more
[sm@bluedata750 bin]$ hadoop fs -ls /user/sm
Found 7 items
drwxrwx---+ - sm supergroup 0 2018-10-30 21:27 /user/sm/.sparkStaging
-rw-rw----+ 3 sm supergroup 5440756334 2018-10-30 21:03 /user/sm/LN44765.bed
drwxrwx---+ - sm supergroup 0 2018-11-09 07:19 /user/sm/chr17.7500000-7515000.sam.adam
drwxrwx---+ - sm supergroup 0 2018-10-30 21:27 /user/sm/chr17.adam
-rw-rw----+ 3 sm supergroup 91866 2018-10-17 10:28 /user/sm/chr17.vcf
-rw-rw----+ 3 sm supergroup 3344732 2018-11-09 07:28 /user/sm/ensGene.bb
-rw-rw----+ 3 sm supergroup 21252941 2018-11-09 07:21 /user/sm/hg19.17.2bit
Please try
./mango-submit --packages org.apache.parquet:parquet-hadoop:1.8.2 -- /user/sm/hg19.17.2bit -genes /user/sm/ensGene.bb -reads /user/sm/chr17.7500000-7515000.sam.adam -variants /user/sm/chr17.adam -show_genotypes