vg icon indicating copy to clipboard operation
vg copied to clipboard

Error in 'vg autoindex' a GFA file derived from PGGB

Open fangbohao opened this issue 1 year ago • 5 comments

1. What were you trying to do? I am trying to index a GFA graph file (a chromosome) derived from PGGB.

2. What did you want to happen? index done.

3. What actually happened? error message appears as above.

4. If you got a line like Stack trace path: /somewhere/on/your/computer/stacktrace.txt, please copy-paste the contents of that file here:

Crash report for vg v1.41.0 "Salmour"
Stack trace (most recent call last):
#24   Object "/n/home00/bfang/.conda/envs/fasrc/bin/vg", at 0x5df43d, in _start
#23   Object "/n/home00/bfang/.conda/envs/fasrc/bin/vg", at 0x1e520cf, in __libc_start_main
#22   Object "/n/home00/bfang/.conda/envs/fasrc/bin/vg", at 0x5b08ce, in main
#21   Object "/n/home00/bfang/.conda/envs/fasrc/bin/vg", at 0xd3347b, in vg::subcommand::Subcommand::operator()(int, char**) const
#20   Object "/n/home00/bfang/.conda/envs/fasrc/bin/vg", at 0xc1237c, in main_autoindex(int, char**)
#19   Object "/n/home00/bfang/.conda/envs/fasrc/bin/vg", at 0xf41d48, in vg::IndexRegistry::make_indexes(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocato
r<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&)
#18   Object "/n/home00/bfang/.conda/envs/fasrc/bin/vg", at 0xf2dde8, in vg::IndexRegistry::execute_recipe(std::pair<std::set<std::__cxx11::basic_string<char, std::char_traits<char>, std:
:allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, st
d::allocator<char> > > >, unsigned long> const&, vg::IndexingPlan const*, vg::AliasGraph&)
#17   Object "/n/home00/bfang/.conda/envs/fasrc/bin/vg", at 0xf2d7fd, in std::_Function_handler<std::vector<std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::alloc
ator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, std::allocator<std::vector<std::__cxx11::basic_string<char, std::char_tra
its<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > > (std::vector<vg::IndexFile const*, std::allocator
<vg::IndexFile const*> > const&, vg::IndexingPlan const*, vg::AliasGraph&, std::set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11
::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&), vg::VGIn
dexes::get_vg_index_registry()::{lambda(std::vector<vg::IndexFile const*, std::allocator<vg::IndexFile const*> > const&, vg::IndexingPlan const*, vg::AliasGraph&, std::set<std::__cxx11::b
asic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11:
:basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&)#15}>::_M_invoke(std::_Any_data const&, std::vector<vg::IndexFile const*, std::allocator<vg::IndexFile const*
> > const&, vg::IndexingPlan const*&&, vg::AliasGraph&, std::set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char
, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&)
#16   Object "/n/home00/bfang/.conda/envs/fasrc/bin/vg", at 0xf2d346, in vg::VGIndexes::get_vg_index_registry()::{lambda(std::vector<vg::IndexFile const*, std::allocator<vg::IndexFile con
st*> > const&, vg::IndexingPlan const*, std::set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_trai
ts<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&)#11}::operator()(std::vector<vg::IndexFile co
nst*, std::allocator<vg::IndexFile const*> > const&, vg::IndexingPlan const*, std::set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cx
x11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&) const 
[clone .isra.0]
#15   Object "/n/home00/bfang/.conda/envs/fasrc/bin/vg", at 0x1318ce0, in vg::algorithms::gfa_to_path_handle_graph(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<
char> > const&, handlegraph::MutablePathMutableHandleGraph*, long, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
#14   Object "/n/home00/bfang/.conda/envs/fasrc/bin/vg", at 0x1316855, in vg::algorithms::gfa_to_path_handle_graph(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<
char> > const&, handlegraph::MutablePathMutableHandleGraph*, vg::algorithms::GFAIDMapInfo*, long)
#13   Object "/n/home00/bfang/.conda/envs/fasrc/bin/vg", at 0x126cb90, in vg::get_input_file(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::f
unction<void (std::istream&)>)
#12   Object "/n/home00/bfang/.conda/envs/fasrc/bin/vg", at 0x131f4ac, in std::_Function_handler<void (std::istream&), vg::algorithms::gfa_to_path_handle_graph(std::__cxx11::basic_string<
char, std::char_traits<char>, std::allocator<char> > const&, handlegraph::MutablePathMutableHandleGraph*, vg::algorithms::GFAIDMapInfo*, long)::{lambda(std::istream&)#1}>::_M_invoke(std::
_Any_data const&, std::istream&)
#11   Object "/n/home00/bfang/.conda/envs/fasrc/bin/vg", at 0x131dfc9, in vg::algorithms::GFAParser::parse(std::istream&)
#10   Object "/n/home00/bfang/.conda/envs/fasrc/bin/vg", at 0x131c367, in vg::algorithms::GFAParser::parse(std::istream&)::{lambda()#3}::operator()() const
#9    Object "/n/home00/bfang/.conda/envs/fasrc/bin/vg", at 0x1318f12, in vg::algorithms::add_path_listeners(vg::algorithms::GFAParser&, handlegraph::MutablePathMutableHandleGraph*)::{lam
bda(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::pair<__gnu_cxx::__normal_iterator<char const*, std::__cxx11::basic_string<char, std::char_
traits<char>, std::allocator<char> > >, __gnu_cxx::__normal_iterator<char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::pair<__g
nu_cxx::__normal_iterator<char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, __gnu_cxx::__normal_iterator<char const*, std::__cxx11::basic_str
ing<char, std::char_traits<char>, std::allocator<char> > > > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx
11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&)#2}::operator()(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, st
d::pair<__gnu_cxx::__normal_iterator<char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, __gnu_cxx::__normal_iterator<char const*, std::__cxx11
::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::pair<__gnu_cxx::__normal_iterator<char const*, std::__cxx11::basic_string<char, std::char_traits<char>
, std::allocator<char> > >, __gnu_cxx::__normal_iterator<char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::vector<std::__cxx11:
:basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&) const [clone
 .isra.0]
#8    Object "/n/home00/bfang/.conda/envs/fasrc/bin/vg", at 0x17ee688, in handlegraph::PathMetadata::parse_path_name(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocato
r<char> > const&, handlegraph::PathSense&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, std::__cxx11::basic_string<char, std::char_traits<char>, std::
allocator<char> >&, unsigned long&, unsigned long&, std::pair<unsigned long, unsigned long>&)
#7    Object "/n/home00/bfang/.conda/envs/fasrc/bin/vg", at 0x17eea10, in long long __gnu_cxx::__stoa<long long, long long, char, int>(long long (*)(char const*, char**, int), char const*
, char const*, unsigned long*, int)
#6    Object "/n/home00/bfang/.conda/envs/fasrc/bin/vg", at 0x5af280, in std::__throw_invalid_argument(char const*)
#5    Object "/n/home00/bfang/.conda/envs/fasrc/bin/vg", at 0x1d8e148, in __cxa_throw
#4    Object "/n/home00/bfang/.conda/envs/fasrc/bin/vg", at 0x1d8dfe6, in std::terminate()
#3    Object "/n/home00/bfang/.conda/envs/fasrc/bin/vg", at 0x1d8df7b, in __cxxabiv1::__terminate(void (*)())
#2    Object "/n/home00/bfang/.conda/envs/fasrc/bin/vg", at 0x5ad45a, in __gnu_cxx::__verbose_terminate_handler() [clone .cold]
#1    Object "/n/home00/bfang/.conda/envs/fasrc/bin/vg", at 0x5afdf7, in abort
#0    Object "/n/home00/bfang/.conda/envs/fasrc/bin/vg", at 0x145d3ab, in raise

5. What data and command can the vg dev team use to make the problem happen?

6. What does running vg version say?

vg v1.41.0

fangbohao avatar Aug 04 '22 19:08 fangbohao

Some big chromosomes work well with 'vg autoindex', but small chromosomes did not work properly, occurring issues above.

fangbohao avatar Aug 04 '22 19:08 fangbohao

Can you provide the command line call that you ran into this error on?

jeizenga avatar Aug 04 '22 20:08 jeizenga

Thanks for your reply. Here you go:

vg autoindex --workflow giraffe
-g $gfa_chr37 -t 23
--target-mem 90G

On Thu, Aug 4, 2022 at 4:16 PM Jordan Eizenga @.***> wrote:

Can you provide the command line call that you ran into this error on?

— Reply to this email directly, view it on GitHub https://github.com/vgteam/vg/issues/3712#issuecomment-1205724194, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQTTOOCEDJMTOBEKTAT5BYTVXQQJFANCNFSM55TXCDCQ . You are receiving this because you authored the thread.Message ID: @.***>

fangbohao avatar Aug 04 '22 20:08 fangbohao

By the way, here is the GFA file I used, which is 52MB, a small chromosome.

Please let me know if the GFA file is wrong or not properly produced.

Thank you! Bohao Fang

VGP#prim#SUPER_37.pan.fa.gz.3051141.04f1c29.ecb... https://drive.google.com/file/d/1nLpGPHSlZs4h1hmfuJHcI3hOyIFIDvXY/view?usp=drive_web

On Thu, Aug 4, 2022 at 4:59 PM Bohao Fang @.***> wrote:

Thanks for your reply. Here you go:

vg autoindex --workflow giraffe
-g $gfa_chr37 -t 23
--target-mem 90G

On Thu, Aug 4, 2022 at 4:16 PM Jordan Eizenga @.***> wrote:

Can you provide the command line call that you ran into this error on?

— Reply to this email directly, view it on GitHub https://github.com/vgteam/vg/issues/3712#issuecomment-1205724194, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQTTOOCEDJMTOBEKTAT5BYTVXQQJFANCNFSM55TXCDCQ . You are receiving this because you authored the thread.Message ID: @.***>

fangbohao avatar Aug 04 '22 21:08 fangbohao

@adamnovak This looks to me like it's running into a problem in the named-node stuff you implemented. Could you take a look?

jeizenga avatar Aug 04 '22 21:08 jeizenga

I came across this issue when using panSN-spec named input like

ARS_UCD12#hap0#6

but there is a stoll call on the haplotype, so should just be numeric (i.e. "ARS_UCD12#0#6"). Not sure if this was causing the same issue, but I got very similar crash log.

I couldn't find clear documentation on the pathsense API, but from vg paths -Mv it looks like it expects further groupings than panSN-spec? Is it possible to denote e.g. a primary assembly path vs a haplotype-resolved path or will everything need the sample ploidy to work?

Best, Alex

ASLeonard avatar Jun 26 '23 15:06 ASLeonard

Found the [path metadata model[(https://github.com/vgteam/vg/wiki/Path-Metadata-Model) (I knew I had stumbled on it before), so will try with this a bit further

ASLeonard avatar Jun 27 '23 08:06 ASLeonard

Unfortunately I can't get @fangbohao's file; it looks like it's a Google Drive upload shared with a specific list of people that I'm not on.

But it does seem like a path like ARS_UCD12#hap0#6 might be able to cause a crash in __gnu_cxx::__stoa (which is the string-to-number converter) inside path name parsing.

By my reading of the panSN spec that I had when I wrote the path name parsing, that isn't valid panSN because the haplotype piece hap0 is a string; I thought only numbers were allowed there. Maybe that isn't really true?

Whether that's true or not, we should produce a more useful error when we can't parse the path name.

adamnovak avatar Jun 30 '23 22:06 adamnovak

FWIW, the spec does indeed say here that haplotype ID is a number.

jeizenga avatar Jun 30 '23 23:06 jeizenga

OK, @fangbohao shared the file with me, and I tested my fix, and I now have vg interpreting it like this:

[anovak@swords vg]% vg paths --metadata -x ~/Downloads/VGP\#prim\#SUPER_37.pan.fa.gz.3051141.04f1c29.ecbf8cf.smooth.final.gfa
#NAME	SENSE	SAMPLE	HAPLOTYPE	LOCUS	PHASE_BLOCK	SUBRANGE
MA_2#hap2#h2tg000495l	GENERIC	NO_SAMPLE_NAME	NO_HAPLOTYPE	MA_2#hap2#h2tg000495l	NO_PHASE_BLOCK	NO_SUBRANGE
WA_2#hap1#h1tg000618l	GENERIC	NO_SAMPLE_NAME	NO_HAPLOTYPE	WA_2#hap1#h1tg000618l	NO_PHASE_BLOCK	NO_SUBRANGE
NM_1#hap2#h2tg000401l	GENERIC	NO_SAMPLE_NAME	NO_HAPLOTYPE	NM_1#hap2#h2tg000401l	NO_PHASE_BLOCK	NO_SUBRANGE
AZ_2#hap2#h2tg000020l	GENERIC	NO_SAMPLE_NAME	NO_HAPLOTYPE	AZ_2#hap2#h2tg000020l	NO_PHASE_BLOCK	NO_SUBRANGE
CA_1#hap1#h1tg001701l	GENERIC	NO_SAMPLE_NAME	NO_HAPLOTYPE	CA_1#hap1#h1tg001701l	NO_PHASE_BLOCK	NO_SUBRANGE
CA_1#hap2#h2tg004194l	GENERIC	NO_SAMPLE_NAME	NO_HAPLOTYPE	CA_1#hap2#h2tg004194l	NO_PHASE_BLOCK	NO_SUBRANGE
CA_2#hap2#h2tg002977l	GENERIC	NO_SAMPLE_NAME	NO_HAPLOTYPE	CA_2#hap2#h2tg002977l	NO_PHASE_BLOCK	NO_SUBRANGE
...

It's not parsing it as the file writer intended, I don't think, but it is parsing it to something we can represent. For the file to really work properly (and not result in a possibly unmanageable number of named paths), hap1 and hap2 need to be changed to just 1 and 2. But with #4010 we should at least no longer crash like this.

adamnovak avatar Jul 03 '23 15:07 adamnovak