vg icon indicating copy to clipboard operation
vg copied to clipboard

Invalid Haplotype Field Error During pggb to gbz Conversion Due to Incorrect Regex in P-line

Open sloth-eat-pudding opened this issue 10 months ago • 1 comments

1. What were you trying to do?

I was attempting to convert a pggb pangenome graph to gbz format for use in Giraffe.

2. What did you want to happen?

I wanted a direct conversion without issues.

3. What actually happened?

I encountered an error stating what(): MetadataBuilder: Invalid haplotype field JAHBCA010000258.1.

4. If you got a line like Stack trace path: /somewhere/on/your/computer/stacktrace.txt, please copy-paste the contents of that file here:

root@73614c8feec4:/# vg gbwt -G hprc-v1.0-pggb.gfa --gbz-format -g hprc-v1.0-pggb-all-gbwt.gbz
terminate called after throwing an instance of 'std::runtime_error'
  what():  MetadataBuilder: Invalid haplotype field JAHBCA010000258.1
━━━━━━━━━━━━━━━━━━━━
Crash report for vg v1.50.1 "Monopoli"
Stack trace (most recent call last):
#15   Object "/vg/bin/vg", at 0x5f2fcd, in _start
#14   Object "/vg/bin/vg", at 0x1ef638f, in __libc_start_main
#13   Object "/vg/bin/vg", at 0x5c2d3e, in main
#12   Object "/vg/bin/vg", at 0xd694fb, in vg::subcommand::Subcommand::operator()(int, char**) const
#11   Object "/vg/bin/vg", at 0xdb241d, in main_gbwt(int, char**)
#10   Object "/vg/bin/vg", at 0xdafa6a, in step_1_build_gbwts(vg::GBWTHandler&, GraphHandler&, GBWTConfig&)
#9    Object "/vg/bin/vg", at 0x1575cf3, in gbwtgraph::gfa_to_gbwt(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, gbwtgraph::GFAParsingParameters const&)
#8    Object "/vg/bin/vg", at 0x156a8df, in gbwtgraph::parse_metadata(gbwtgraph::GFAFile const&, std::vector<gbwtgraph::ConstructionJob, std::allocator<gbwtgraph::ConstructionJob> > const&, gbwtgraph::MetadataBuilder&, gbwtgraph::GFAParsingParameters const&)
#7    Object "/vg/bin/vg", at 0x1566e20, in gbwtgraph::GFAFile::for_these_path_names(std::vector<char const*, std::allocator<char const*> > const&, std::function<void (std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)> const&) const
#6    Object "/vg/bin/vg", at 0x57f406, in gbwtgraph::MetadataBuilder::add_path(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, unsigned long) [clone .cold]
#5    Object "/vg/bin/vg", at 0x1e32408, in __cxa_throw
#4    Object "/vg/bin/vg", at 0x1e322a6, in std::terminate()
#3    Object "/vg/bin/vg", at 0x1e3223b, in __cxxabiv1::__terminate(void (*)())
#2    Object "/vg/bin/vg", at 0x5bf8ca, in __gnu_cxx::__verbose_terminate_handler() [clone .cold]
#1    Object "/vg/bin/vg", at 0x5c2267, in abort
#0    Object "/vg/bin/vg", at 0x14b64cb, in raise
ERROR: Signal 6 occurred. VG has crashed. Visit https://github.com/vgteam/vg/issues/new/choose to report a bug.
Please include this entire error log in your bug report!
━━━━━━━━━━━━━━━━━━━━

5. What data and command can the vg dev team use to make the problem happen?

Data: hprc-v1.0-pggb.gfa Command: vg gbwt -G hprc-v1.0-pggb.gfa --gbz-format -g hprc-v1.0-pggb-all-gbwt.gbz

6. What does running vg version say?

vg version v1.51.0 "Quellenhof"
Compiled with g++ (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 on Linux
Linked against libstd++ 20230528
Built by root@805a61b04cee

I suspect the issue originates from the incorrect regex on the P-line. In the hprc-v1.0-pggb file, the P-line contains additional MT information. The regex pattern is (.*)#(.*)#(.*). So, when given P HG00438#2#JAHBCA010000258.1#MT, it splits it into [HG00438#2][JAHBCA010000258.1][MT]. The second piece of information should be the haplotype. As a result, it attempts to convert JAHBCA010000258.1 into a number, causing the error. I found this regex pattern defined in /vg/deps/gbwtgraph/src/gfa.cpp as const std::string GFAParsingParameters::PAN_SN_REGEX = "(.*)#(.*)#(.*)";. I hope this information is helpful to you.

sloth-eat-pudding avatar Sep 24 '23 14:09 sloth-eat-pudding