mango icon indicating copy to clipboard operation
mango copied to clipboard

Mango cannot load fasta files with pipes in the contigName

Open benwbooth opened this issue 7 years ago • 2 comments

If I run mango-submit passing a .fasta file as reference:

/users/bbooth/src/mango/bin/mango-submit /data/seqdata/analysis/fakereads/S288C_reference_genome_R64-2-1_20150113/S288C_reference_sequence_R64-2-1_20150113.fsa.fasta -features /data/seqdata/analysis/fakereads//data/seqdata/analysis/fakereads/S288C_reference_genome_R64-2-1_20150113/saccharomyces_cerevisiae_R64-2-1_20150113.genes.gff3

The fasta file has chromosome names with pipes, e.g.:

ref|NC_001141|
ref|NC_001136|
ref|NC_001135|
ref|NC_001144|

..., etc.

Then I get this error:

Command body threw exception:
java.lang.AssertionError: assertion failed: SequenceRecord.name is null or empty
Exception in thread "main" java.lang.AssertionError: assertion failed: SequenceRecord.name is null or empty
        at scala.Predef$.assert(Predef.scala:170)
        at org.bdgenomics.adam.models.SequenceRecord.<init>(SequenceDictionary.scala:287)
        at org.bdgenomics.adam.models.SequenceRecord$.apply(SequenceDictionary.scala:403)
        at org.bdgenomics.adam.util.ReferenceContigMap$$anonfun$1.apply(ReferenceContigMap.scala:51)
        at org.bdgenomics.adam.util.ReferenceContigMap$$anonfun$1.apply(ReferenceContigMap.scala:50)
        at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
        at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
        at scala.collection.immutable.Map$Map1.foreach(Map.scala:116)
        at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
        at scala.collection.AbstractTraversable.map(Traversable.scala:104)
        at org.bdgenomics.adam.util.ReferenceContigMap.<init>(ReferenceContigMap.scala:50)
        at org.bdgenomics.adam.util.ReferenceContigMap$.apply(ReferenceContigMap.scala:107)
        at org.bdgenomics.adam.rdd.ADAMContext$$anonfun$loadReferenceFile$1.apply(ADAMContext.scala:3010)
        at org.bdgenomics.adam.rdd.ADAMContext$$anonfun$loadReferenceFile$1.apply(ADAMContext.scala:3007)
        at scala.Option.fold(Option.scala:158)
        at org.apache.spark.rdd.Timer.time(Timer.scala:48)
        at org.bdgenomics.adam.rdd.ADAMContext.loadReferenceFile(ADAMContext.scala:3005)
        at org.bdgenomics.mango.models.AnnotationMaterialization.<init>(AnnotationMaterialization.scala:42)
        at org.bdgenomics.mango.cli.VizReads.initAnnotations(VizReads.scala:638)
        at org.bdgenomics.mango.cli.VizReads.run(VizReads.scala:586)
        at org.bdgenomics.utils.cli.BDGSparkCommand$class.run(BDGCommand.scala:55)
        at org.bdgenomics.mango.cli.VizReads.run(VizReads.scala:579)
        at org.bdgenomics.utils.cli.BDGCommandCompanion$class.main(BDGCommand.scala:33)
        at org.bdgenomics.mango.cli.VizReads$.main(VizReads.scala:69)
        at org.bdgenomics.mango.cli.VizReads.main(VizReads.scala)

This is due to the following lines in FastaConverter.parseDescriptionLine:

          // is this description metadata or not? if it is metadata, it will contain "|"
          if (split._1.contains('|')) {
            (None, Some(dL.stripPrefix(">").trim))

If a pipe character appears in the contig name, then the NucleotideFragment doesn't get a name, but only gets a description with the name included. This seems counterintuitive.

If there is no contigName, then mango doesn't know how to handle it. It seems obvious that fasta files should always get a contigName, even if the name contains a pipe character.

benwbooth avatar Aug 30 '18 22:08 benwbooth

Converting the fasta file to two-bit format works as a workaround for this case.

benwbooth avatar Aug 30 '18 22:08 benwbooth

Hi @benwbooth, thanks for the catch! This looks like it is a bug in ADAM FastaConverter, not Mango. Can you make an issue there so we can track it?

In general, twoBit files are a little nicer to work with for the browser, due to their smaller size and responsiveness.

akmorrow13 avatar Aug 30 '18 22:08 akmorrow13