Hi @arahuja @ryan-williams ,
Sorry for deviating you from your work. Could you please look into the issue am facing with.
I looked into htsjdk version 1.118 there are several new releases. I tried to change the artifact ID with new release but the artifact ID or dependencies is only for 1.118 version.
Could you please look into this.
16/02/03 10:37:26 WARN TaskSetManager: Lost task 0.0 in stage 15.0 (TID 1679, istb1-l2-b14-03.hadoop.priv): java.lang.IllegalArgumentException: Null alleles are not supported
at htsjdk.variant.variantcontext.Allele.(Allele.java:139)
at htsjdk.variant.variantcontext.Allele.create(Allele.java:234)
at htsjdk.variant.variantcontext.Allele.create(Allele.java:355)
at org.bdgenomics.adam.converters.VariantContextConverter$.convertAllele(VariantContextConverter.scala:53)
at org.bdgenomics.adam.converters.VariantContextConverter$.org$bdgenomics$adam$converters$VariantContextConverter$$convertAlleles(VariantContextConverter.scala:57)
at org.bdgenomics.adam.converters.VariantContextConverter.convert(VariantContextConverter.scala:329)
at org.bdgenomics.adam.rdd.variation.VariantContextRDDFunctions$$anonfun$4.apply(VariationRDDFunctions.scala:121)
at org.bdgenomics.adam.rdd.variation.VariantContextRDDFunctions$$anonfun$4.apply(VariationRDDFunctions.scala:119)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12$$anonfun$apply$4.apply$mcV$sp(PairRDDFunctions.scala:1035)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12$$anonfun$apply$4.apply(PairRDDFunctions.scala:1034)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12$$anonfun$apply$4.apply(PairRDDFunctions.scala:1034)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1285)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12.apply(PairRDDFunctions.scala:1042)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12.apply(PairRDDFunctions.scala:1014)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
at org.apache.spark.scheduler.Task.run(Task.scala:70)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
16/02/03 10:37:26 INFO TaskSetManager: Starting task 0.1 in stage 15.0 (TID 1680, istb1-l2-b13-01.hadoop.priv, PROCESS_LOCAL, 1541 bytes)
16/02/03 10:37:26 INFO BlockManagerInfo: Added broadcast_14_piece0 in memory on istb1-l2-b13-01.hadoop.priv:56689 (size: 26.4 KB, free: 2.1 GB)
16/02/03 10:37:26 INFO MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 1 to istb1-l2-b13-01.hadoop.priv:41144
16/02/03 10:38:01 WARN TaskSetManager: Lost task 0.1 in stage 15.0 (TID 1680, istb1-l2-b13-01.hadoop.priv): java.lang.IllegalArgumentException: Null alleles are not supported
at htsjdk.variant.variantcontext.Allele.(Allele.java:139)
at htsjdk.variant.variantcontext.Allele.create(Allele.java:234)
at htsjdk.variant.variantcontext.Allele.create(Allele.java:355)
at org.bdgenomics.adam.converters.VariantContextConverter$.convertAllele(VariantContextConverter.scala:53)
at org.bdgenomics.adam.converters.VariantContextConverter$.org$bdgenomics$adam$converters$VariantContextConverter$$convertAlleles(VariantContextConverter.scala:57)
at org.bdgenomics.adam.converters.VariantContextConverter.convert(VariantContextConverter.scala:329)
at org.bdgenomics.adam.rdd.variation.VariantContextRDDFunctions$$anonfun$4.apply(VariationRDDFunctions.scala:121)
at org.bdgenomics.adam.rdd.variation.VariantContextRDDFunctions$$anonfun$4.apply(VariationRDDFunctions.scala:119)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12$$anonfun$apply$4.apply$mcV$sp(PairRDDFunctions.scala:1035)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12$$anonfun$apply$4.apply(PairRDDFunctions.scala:1034)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12$$anonfun$apply$4.apply(PairRDDFunctions.scala:1034)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1285)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12.apply(PairRDDFunctions.scala:1042)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12.apply(PairRDDFunctions.scala:1014)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
at org.apache.spark.scheduler.Task.run(Task.scala:70)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
16/02/03 10:38:01 INFO TaskSetManager: Starting task 0.2 in stage 15.0 (TID 1681, istb1-l2-b11-07.hadoop.priv, PROCESS_LOCAL, 1541 bytes)
16/02/03 10:38:01 INFO BlockManagerInfo: Added broadcast_14_piece0 in memory on istb1-l2-b11-07.hadoop.priv:47042 (size: 26.4 KB, free: 2.1 GB)
16/02/03 10:38:01 INFO MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 1 to istb1-l2-b11-07.hadoop.priv:41636
16/02/03 10:38:29 WARN TaskSetManager: Lost task 0.2 in stage 15.0 (TID 1681, istb1-l2-b11-07.hadoop.priv): java.lang.IllegalArgumentException: Null alleles are not supported
at htsjdk.variant.variantcontext.Allele.(Allele.java:139)
at htsjdk.variant.variantcontext.Allele.create(Allele.java:234)
at htsjdk.variant.variantcontext.Allele.create(Allele.java:355)
at org.bdgenomics.adam.converters.VariantContextConverter$.convertAllele(VariantContextConverter.scala:53)
at org.bdgenomics.adam.converters.VariantContextConverter$.org$bdgenomics$adam$converters$VariantContextConverter$$convertAlleles(VariantContextConverter.scala:57)
at org.bdgenomics.adam.converters.VariantContextConverter.convert(VariantContextConverter.scala:329)
at org.bdgenomics.adam.rdd.variation.VariantContextRDDFunctions$$anonfun$4.apply(VariationRDDFunctions.scala:121)
at org.bdgenomics.adam.rdd.variation.VariantContextRDDFunctions$$anonfun$4.apply(VariationRDDFunctions.scala:119)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12$$anonfun$apply$4.apply$mcV$sp(PairRDDFunctions.scala:1035)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12$$anonfun$apply$4.apply(PairRDDFunctions.scala:1034)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12$$anonfun$apply$4.apply(PairRDDFunctions.scala:1034)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1285)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12.apply(PairRDDFunctions.scala:1042)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12.apply(PairRDDFunctions.scala:1014)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
at org.apache.spark.scheduler.Task.run(Task.scala:70)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
16/02/03 10:38:29 INFO TaskSetManager: Starting task 0.3 in stage 15.0 (TID 1682, istb1-l2-b14-07.hadoop.priv, PROCESS_LOCAL, 1541 bytes)
16/02/03 10:38:29 INFO BlockManagerInfo: Added broadcast_14_piece0 in memory on istb1-l2-b14-07.hadoop.priv:36525 (size: 26.4 KB, free: 2.1 GB)
16/02/03 10:38:30 INFO MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 1 to istb1-l2-b14-07.hadoop.priv:58910
16/02/03 10:38:53 WARN TaskSetManager: Lost task 0.3 in stage 15.0 (TID 1682, istb1-l2-b14-07.hadoop.priv): java.lang.IllegalArgumentException: Null alleles are not supported
at htsjdk.variant.variantcontext.Allele.(Allele.java:139)
at htsjdk.variant.variantcontext.Allele.create(Allele.java:234)
at htsjdk.variant.variantcontext.Allele.create(Allele.java:355)
at org.bdgenomics.adam.converters.VariantContextConverter$.convertAllele(VariantContextConverter.scala:53)
at org.bdgenomics.adam.converters.VariantContextConverter$.org$bdgenomics$adam$converters$VariantContextConverter$$convertAlleles(VariantContextConverter.scala:57)
at org.bdgenomics.adam.converters.VariantContextConverter.convert(VariantContextConverter.scala:329)
at org.bdgenomics.adam.rdd.variation.VariantContextRDDFunctions$$anonfun$4.apply(VariationRDDFunctions.scala:121)
at org.bdgenomics.adam.rdd.variation.VariantContextRDDFunctions$$anonfun$4.apply(VariationRDDFunctions.scala:119)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12$$anonfun$apply$4.apply$mcV$sp(PairRDDFunctions.scala:1035)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12$$anonfun$apply$4.apply(PairRDDFunctions.scala:1034)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12$$anonfun$apply$4.apply(PairRDDFunctions.scala:1034)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1285)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12.apply(PairRDDFunctions.scala:1042)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12.apply(PairRDDFunctions.scala:1014)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
at org.apache.spark.scheduler.Task.run(Task.scala:70)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
16/02/03 10:38:53 ERROR TaskSetManager: Task 0 in stage 15.0 failed 4 times; aborting job
16/02/03 10:38:53 INFO YarnScheduler: Removed TaskSet 15.0, whose tasks have all completed, from pool
16/02/03 10:38:53 INFO YarnScheduler: Cancelling stage 15
16/02/03 10:38:53 INFO DAGScheduler: ResultStage 15 (saveAsNewAPIHadoopFile at VariationRDDFunctions.scala:140) failed in 115.716 s
16/02/03 10:38:53 INFO DAGScheduler: Job 4 failed: saveAsNewAPIHadoopFile at VariationRDDFunctions.scala:140, took 115.912601 s
16/02/03 10:38:53 INFO SparkUI: Stopped Spark web UI at http://10.107.18.34:4040
16/02/03 10:38:53 INFO DAGScheduler: Stopping DAGScheduler
16/02/03 10:38:53 INFO YarnClientSchedulerBackend: Shutting down all executors
16/02/03 10:38:53 INFO YarnClientSchedulerBackend: Interrupting monitor thread
16/02/03 10:38:53 INFO YarnClientSchedulerBackend: Asking each executor to shut down
16/02/03 10:38:53 INFO YarnClientSchedulerBackend: Stopped
16/02/03 10:38:53 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
16/02/03 10:38:53 INFO Utils: path = /tmp/spark-a8f2c424-5e2f-40a2-a984-b51d5cf8954d/blockmgr-645bc7be-4363-40e1-b608-002dfb7916ac, already present as root for deletion.
Thanks & Regards,
Ankush Reddy.
Hi @arahuja did you find any fix for this?
I found that similar issue was with GATK. Please find the links for reference.
http://gatkforums.broadinstitute.org/gatk/discussion/3579
http://gatkforums.broadinstitute.org/gatk/discussion/3693/another-gatk-unified-genotyper-null-alleles-are-not-supported
please let me know if this is taken care.
Thanks & Regards,
Ankush Reddy.
Hi @ankushreddy
This error occurs when a variant is called that is deletion or insertion and reference or alternate is an empty string. Was this error from the germline-threshold caller or another? The germline-threshold caller has not seen much use, and was recently removed from the README as a suggested starting point
I am getting this error too with my fork of the SomaticStandardMutationCaller @ankushreddy. It works fine for certain variants/loci, but when I run on larger files this happens sometimes.
I wonder if this is an edge case in PileupElement construction?
Hi @arahuja Thanks for the reply am using germline-threshold. Is there any workaround for this. We were actually testing avocado as well but we see it is actually calling 10 times more number of variants than expected.
@jstjohn I have tested germline-threshold using the mouse data set that is provided as the reference for testing it was working good for me. After that I did not get the results for any other files.