vg icon indicating copy to clipboard operation
vg copied to clipboard

Crash after running vg rna

Open brettChapman opened this issue 3 years ago • 5 comments

Hi

I'm running vg version 1.34.

I'm attempting to create a splice graph.

I have a GTF file where the first column represents the paths in the graph across all genomes. I generated the GTF using Stringtie given some RNA-seq results as a sorted BAM file. I produced a genome graph using PGGB and I'm attempting to embed transcript regions into the seqwish graph before I continue with normalisation again in PGGB using smoothxg.

I've been getting the following error:

+ srun -n 1 singularity exec --bind /data/pangenome_20way/results_rerun/test_vg_rna/1H:/data/pangenome_20way/results_rerun/test_vg_rna/1H /data/vg_builds/vg.sif vg rna -t 32 -n /data/pangenome_20way/results_rerun/test_vg_rna/1H/reference.gtf -p -e barley_pangenome_1H_s1000000_l0_p95_k316_B10000000_I0_R0_j100_e0_P1-4-6-2-26-1/barley_pangenome_1H.fasta.5afc036.7715ffd.seqwish.gfa.pg
[vg rna] Parsing graph file ...
[vg rna] Graph parsed in 1.51227 seconds, 2.14698 GB
[vg rna] Adding novel exon boundaries and splice-junctions to graph ...
[vg rna] 0 introns and 210668 transcripts parsed, and graph augmented in 188.199 seconds, 13.1727 GB
[vg rna] Topological sorting and compacting splice graph ...
[vg rna] Splice graph sorted and compacted in 29.7066 seconds, 13.1727 GB
[vg rna] Projecting haplotype-specfic transcripts ...
vg: src/transcriptome.cpp:1080: std::__cxx11::list<vg::EditedTranscriptPath> vg::Transcriptome::project_transcript_embedded(const vg::Transcript&, const bdsg::PositionOverlay&, bool) const: Assertion `border_offsets.first + 1 == _splice_graph->get_length(_splice_graph->get_handle_of_step(haplotype_path_start_step))' failed.
vg: src/transcriptome.cpp:1080: std::__cxx11::list<vg::EditedTranscriptPath> vg::Transcriptome::project_transcript_embedded(const vg::Transcript&, const bdsg::PositionOverlay&, bool) const: Assertion `border_offsets.first + 1 == _splice_graph->get_length(_splice_graph->get_handle_of_step(haplotype_path_start_step))' failed.
vg: src/transcriptome.cpp:1080: std::__cxx11::list<vg::EditedTranscriptPath> vg::Transcriptome::project_transcript_embedded(const vg::Transcript&, const bdsg::PositionOverlay&, bool) const: Assertion `border_offsets.first + 1 == _splice_graph->get_length(_splice_graph->get_handle_of_step(haplotype_path_start_step))' failed.
vg: src/transcriptome.cpp:1101: std::__cxx11::list<vg::EditedTranscriptPath> vg::Transcriptome::project_transcript_embedded(const vg::Transcript&, const bdsg::PositionOverlay&, bool) const: Assertion `haplotype_path_start_step != haplotype_path_end_step' failed.
vg: src/transcriptome.cpp:1080: std::__cxx11::list<vg::EditedTranscriptPath> vg::Transcriptome::project_transcript_embedded(const vg::Transcript&, const bdsg::PositionOverlay&, bool) const: Assertion `border_offsets.first + 1 == _splice_graph->get_length(_splice_graph->get_handle_of_step(haplotype_path_start_step))' failed.
vg: src/transcriptome.cpp:1080: std::__cxx11::list<vg::EditedTranscriptPath> vg::Transcriptome::project_transcript_embedded(const vg::Transcript&, const bdsg::PositionOverlay&, bool) const: Assertion `border_offsets.first + 1 == _splice_graph->get_length(_splice_graph->get_handle_of_step(haplotype_path_start_step))' failed.
vg: src/transcriptome.cpp:1081: std::__cxx11::list<vg::EditedTranscriptPath> vg::Transcriptome::project_transcript_embedded(const vg::Transcript&, const bdsg::PositionOverlay&, bool) const: Assertion `border_offsets.second == 0' failed.
vg: src/transcriptome.cpp:1081: std::__cxx11::list<vg::EditedTranscriptPath> vg::Transcriptome::project_transcript_embedded(const vg::Transcript&, const bdsg::PositionOverlay&, bool) const: Assertion `border_offsets.second == 0' failed.
vg: src/transcriptome.cpp:1080: std::__cxx11::list<vg::EditedTranscriptPath> vg::Transcriptome::project_transcript_embedded(const vg::Transcript&, const bdsg::PositionOverlay&, bool) const: Assertion `border_offsets.first + 1 == _splice_graph->get_length(_splice_graph->get_handle_of_step(haplotype_path_start_step))' failed.
vg: src/transcriptome.cpp:1080: std::__cxx11::list<vg::EditedTranscriptPath> vg::Transcriptome::project_transcript_embedded(const vg::Transcript&, const bdsg::PositionOverlay&, bool) const: Assertion `border_offsets.first + 1 == _splice_graph->get_length(_splice_graph->get_handle_of_step(haplotype_path_start_step))' failed.
ERROR: Signal 6 occurred. VG has crashed. Visit https://github.com/vgteam/vg/issues/new/choose to report a bug.
Stack trace path: /tmp/vg_crash_SltY5f/stacktrace.txt
Please include the stack trace file in your bug report!
ERROR: Signal 6 occurred. VG has crashed. Visit https://github.com/vgteam/vg/issues/new/choose to report a bug.
Stack trace path: /tmp/vg_crash_JWOJQg/stacktrace.txt
Please include the stack trace file in your bug report!
srun: error: node-12: task 0: Segmentation fault (core dumped)

The head of my GTF is:

HOR10350_v1_chr1H       StringTie       transcript      580313  581138  1000    +       .       gene_id "Horvu_HOR_10350_MSTRG.1"; transcript_id "Horvu_10350_1H01G002000.1"; ref_gene_id "Horvu_10350_1H01G002000"; 
HOR10350_v1_chr1H       StringTie       exon    580313  580490  1000    +       .       gene_id "Horvu_HOR_10350_MSTRG.1"; transcript_id "Horvu_10350_1H01G002000.1"; exon_number "1"; ref_gene_id "Horvu_10350_1H01G0020>
HOR10350_v1_chr1H       StringTie       exon    580906  581138  1000    +       .       gene_id "Horvu_HOR_10350_MSTRG.1"; transcript_id "Horvu_10350_1H01G002000.1"; exon_number "2"; ref_gene_id "Horvu_10350_1H01G0020>
HOR10350_v1_chr1H       StringTie       transcript      254863  259344  1000    -       .       gene_id "Horvu_HOR_10350_MSTRG.2"; transcript_id "Horvu_HOR_10350_MSTRG.2.1"; 
HOR10350_v1_chr1H       StringTie       exon    254863  255339  1000    -       .       gene_id "Horvu_HOR_10350_MSTRG.2"; transcript_id "Horvu_HOR_10350_MSTRG.2.1"; exon_number "1"; 
HOR10350_v1_chr1H       StringTie       exon    255448  255588  1000    -       .       gene_id "Horvu_HOR_10350_MSTRG.2"; transcript_id "Horvu_HOR_10350_MSTRG.2.1"; exon_number "2"; 
HOR10350_v1_chr1H       StringTie       exon    255685  255735  1000    -       .       gene_id "Horvu_HOR_10350_MSTRG.2"; transcript_id "Horvu_HOR_10350_MSTRG.2.1"; exon_number "3"; 
HOR10350_v1_chr1H       StringTie       exon    256222  256393  1000    -       .       gene_id "Horvu_HOR_10350_MSTRG.2"; transcript_id "Horvu_HOR_10350_MSTRG.2.1"; exon_number "4"; 
HOR10350_v1_chr1H       StringTie       exon    256477  256723  1000    -       .       gene_id "Horvu_HOR_10350_MSTRG.2"; transcript_id "Horvu_HOR_10350_MSTRG.2.1"; exon_number "5"; 
HOR10350_v1_chr1H       StringTie       exon    256811  257085  1000    -       .       gene_id "Horvu_HOR_10350_MSTRG.2"; transcript_id "Horvu_HOR_10350_MSTRG.2.1"; exon_number "6"; 
HOR10350_v1_chr1H       StringTie       exon    257248  257652  1000    -       .       gene_id "Horvu_HOR_10350_MSTRG.2"; transcript_id "Horvu_HOR_10350_MSTRG.2.1"; exon_number "7"; 
HOR10350_v1_chr1H       StringTie       exon    257755  258076  1000    -       .       gene_id "Horvu_HOR_10350_MSTRG.2"; transcript_id "Horvu_HOR_10350_MSTRG.2.1"; exon_number "8"; 
HOR10350_v1_chr1H       StringTie       exon    258161  258317  1000    -       .       gene_id "Horvu_HOR_10350_MSTRG.2"; transcript_id "Horvu_HOR_10350_MSTRG.2.1"; exon_number "9"; 
HOR10350_v1_chr1H       StringTie       exon    258447  258506  1000    -       .       gene_id "Horvu_HOR_10350_MSTRG.2"; transcript_id "Horvu_HOR_10350_MSTRG.2.1"; exon_number "10"; 
HOR10350_v1_chr1H       StringTie       exon    259131  259344  1000    -       .       gene_id "Horvu_HOR_10350_MSTRG.2"; transcript_id "Horvu_HOR_10350_MSTRG.2.1"; exon_number "11"; 
HOR10350_v1_chr1H       StringTie       transcript      340170  342653  1000    -       .       gene_id "Horvu_HOR_10350_MSTRG.3"; transcript_id "Horvu_10350_1H01G000700.1"; ref_gene_id "Horvu_10350_1H01G000700"; 
HOR10350_v1_chr1H       StringTie       exon    340170  340207  1000    -       .       gene_id "Horvu_HOR_10350_MSTRG.3"; transcript_id "Horvu_10350_1H01G000700.1"; exon_number "1"; ref_gene_id "Horvu_10350_1H01G0007>
HOR10350_v1_chr1H       StringTie       exon    341462  341765  1000    -       .       gene_id "Horvu_HOR_10350_MSTRG.3"; transcript_id "Horvu_10350_1H01G000700.1"; exon_number "2"; ref_gene_id "Horvu_10350_1H01G0007>
HOR10350_v1_chr1H       StringTie       exon    341774  341908  1000    -       .       gene_id "Horvu_HOR_10350_MSTRG.3"; transcript_id "Horvu_10350_1H01G000700.1"; exon_number "3"; ref_gene_id "Horvu_10350_1H01G0007>
HOR10350_v1_chr1H       StringTie       exon    341919  342151  1000    -       .       gene_id "Horvu_HOR_10350_MSTRG.3"; transcript_id "Horvu_10350_1H01G000700.1"; exon_number "4"; ref_gene_id "Horvu_10350_1H01G0007>
HOR10350_v1_chr1H       StringTie       exon    342432  342545  1000    -       .       gene_id "Horvu_HOR_10350_MSTRG.3"; transcript_id "Horvu_10350_1H01G000700.1"; exon_number "5"; ref_gene_id "Horvu_10350_1H01G0007>
HOR10350_v1_chr1H       StringTie       exon    342568  342653  1000    -       .       gene_id "Horvu_HOR_10350_MSTRG.3"; transcript_id "Horvu_10350_1H01G000700.1"; exon_number "6"; ref_gene_id "Horvu_10350_1H01G0007>
HOR10350_v1_chr1H       StringTie       transcript      352722  355685  1000    +       .       gene_id "Horvu_HOR_10350_MSTRG.4"; transcript_id "Horvu_10350_1H01G000800.1"; ref_gene_id "Horvu_10350_1H01G000800"; 
HOR10350_v1_chr1H       StringTie       exon    352722  355685  1000    +       .       gene_id "Horvu_HOR_10350_MSTRG.4"; transcript_id "Horvu_10350_1H01G000800.1"; exon_number "1"; ref_gene_id "Horvu_10350_1H01G0008>
HOR10350_v1_chr1H       StringTie       transcript      374900  378302  1000    -       .       gene_id "Horvu_HOR_10350_MSTRG.5"; transcript_id "Horvu_10350_1H01G000900.1"; ref_gene_id "Horvu_10350_1H01G000900"; 
HOR10350_v1_chr1H       StringTie       exon    374900  375186  1000    -       .       gene_id "Horvu_HOR_10350_MSTRG.5"; transcript_id "Horvu_10350_1H01G000900.1"; exon_number "1"; ref_gene_id "Horvu_10350_1H01G0009>
HOR10350_v1_chr1H       StringTie       exon    375278  376393  1000    -       .       gene_id "Horvu_HOR_10350_MSTRG.5"; transcript_id "Horvu_10350_1H01G000900.1"; exon_number "2"; ref_gene_id "Horvu_10350_1H01G0009>

The errors indicate a problem with position offset. Would this be referring to the exon coordinates whether they are base-0 or base-1?

Thank you for any help you can provide.

brettChapman avatar Sep 10 '21 05:09 brettChapman

It seems to happen when projecting transcripts between paths in the graph, but I am not sure why it fails on these assertions. I do not think it is a problem with base-0 or 1 since it would assert earlier if that was the problem.

Would it be possible for you to share the data?

jonassibbesen avatar Sep 15 '21 14:09 jonassibbesen

Hi @jonassibbesen

Sure, I could upload the seqwish graph and GTF file to you. How can I get the files to you? Thanks.

brettChapman avatar Sep 16 '21 06:09 brettChapman

Are you able to share them using Dropbox, Google Drive or something similar? My email is [email protected]

jonassibbesen avatar Sep 16 '21 08:09 jonassibbesen

Hi @jonassibbesen

Thanks for your help. I've put the data on Google drive and have emailed you.

brettChapman avatar Sep 17 '21 04:09 brettChapman

Perfect, thank you!

jonassibbesen avatar Sep 17 '21 12:09 jonassibbesen