bayesembler icon indicating copy to clipboard operation
bayesembler copied to clipboard

Error running bayesembler 1.2.0

Open voshalla opened this issue 8 years ago • 4 comments

When trying to run bayesembler 1.2.0 on an alignment generated by tophat2, I get the following error:

bayesembler: /seqdata/krogh/jola/projects/transcriptome_assembly/code/release/bayesembler_1_2_0/src/assembler.cpp:186: void Assembler::markDuplicates(BamTools::BamAlignment&, Assembler::FirstReads_, Assembler::ReadPairs_): Assertion `cur_pos_first_reads_it->second.insert(pair<ReadId, BamTools::BamAlignment*>(ri, new BamTools::BamAlignment(current_alignment))).second' failed.

I assumed it was an issue with the order of the alignments in the bam file, but it still happens after resorting the bam file with samtools, regardless of the version. I was able to run bayesembler on other datasets using the same version of tophat2 without issue, so it doesn't seem to be an issue with the installs.

voshalla avatar Oct 05 '15 18:10 voshalla

Hi,

Thank you for posting. Would it be possible for you to make the data available to us?

Best,

Lasse

lassemaretty avatar Oct 06 '15 06:10 lassemaretty

The smallest bam file causing the error can be downloaded from here:

https://unl.box.com/s/kym2l74fnfd66vt0dskts84awvw5onyh

It was generated by aligning the following reads to the TAIR transcriptome for Arabidopsis:

https://unl.box.com/s/0emodcukni923a49mit1eu23s1hzlvsp

voshalla avatar Oct 06 '15 15:10 voshalla

thanks! Ill look into it and get back to you!

lassemaretty avatar Oct 07 '15 11:10 lassemaretty

I found the solution for this error. Because the reads we're using are simulated expression data, the read name is the sequence coordinates it contains. When two reads are generated from the same coordinates, the read names are not unique. Changing the read names to ensure they are always unique resolved this error, however, I'm now getting the following error later in the assembly:

[23/10/2015 11:13:06] Removing duplicate reads [23/10/2015 11:14:06] Removed duplicates from 3606471 mapped read pairs [23/10/2015 11:14:06] Wrote 2978238 read pairs used for splice-graph construction

[23/10/2015 11:14:06] Spawning graph construction thread [23/10/2015 11:14:06] Generating splice-graphs from stringtie-q20.gtfaccepted_hits_nd_unstranded.bam using cem [23/10/2015 11:15:17] Parsed 7942 graph(s) from cem instance file

[23/10/2015 11:15:17] Parsed 7942 splice graph(s) from cem instance file and collapsed them to 6050 assembly graph(s) (1736 graph(s) excluded due to inference issues resulting from unstranded data). [23/10/2015 11:15:17] 2877984 unique, non-redundant read pairs being used for quantification [23/10/2015 11:15:17] 2.87798e+06 read pairs being used for FPKM normalisation

[23/10/2015 11:15:17] Sorting splice-graphs by read count [23/10/2015 11:15:17] Finished sorting splice-graphs by read count

[23/10/2015 11:15:17] Spawning 15 thread(s) for fetching alignments and 1 i/o thread [23/10/2015 11:16:42] Estimating fragment length distribution from 703 transcripts longer than 2500 nucleotides [23/10/2015 11:16:42] Estimated fragment length "median"=302 and "median absolute deviation"=0 using 543862 observations [23/10/2015 11:16:42] Using Gaussian fragment length distribution with parameters: Mean=302 and SD=0

[23/10/2015 11:16:42] Starting Bayesembler on 725 multi-path graph(s) and 5325 single-path graph(s) [23/10/2015 11:16:42] Spawning 15 Bayesembler thread(s) and 2 i/o threads

bayesembler: /seqdata/krogh/jola/projects/transcriptome_assembly/code/release/bayesembler_1_2_0/src/alignmentParser.cpp:445: CollapsedMap AlignmentParser::calculateFragTranProbabilities(std::vector<Candidate>&, std::vector<FragmentAlignment*>, SequencingModel, bool, std::stringstream&, std::tr1::unordered_mapstd::basic_string<char, int>*): Assertion `probability_matrix.block(0, i, row_idx, 1).sum() < double_underflow' failed.

voshalla avatar Oct 23 '15 16:10 voshalla