vg icon indicating copy to clipboard operation
vg copied to clipboard

vg index command was killed

Open wzhang42 opened this issue 3 years ago • 4 comments

Hi, Sir, I download the graph file 1000GPlons_hs38d1.vg from https://cgl.gi.ucsc.edu/data/giraffe/mapping/graphs/for-NA19239/1000gplons/hs38d1/. Then, I use "vg index -x 1000GPlons_hs38d1.xg -g 1000GPlons_hs38d1.gcsa -k 16 1000GPlons_hs38d1.vg", then it was killed without any message. But I use the small test dataset , "vg index -x x.xg -g x.gcsa -k 16 x.vg" , it's ok. What I should do.? Thank you in advance.

wzhang42 avatar Feb 11 '22 03:02 wzhang42

GCSA construction requires large amounts of memory and temporary disk space. The 1000GP graphs are complex enough that you have to prune them before GCSA construction is even possible. See index construction in the vg wiki for further information. Also, GCSA construction works better with each chromosome in a separate graph file instead of having a single large graph.

If you want to build the indexes for vg map, you can use vg autoindex with the appropriate input for that. The is some documentation in the wiki for that as well.

jltsiren avatar Feb 14 '22 10:02 jltsiren

Hi, jltsiren, Thanks for your reply. Essentially, I am interested in using the prebuilt graph ( hopefully to use 1000GPlons_hs38d1.vg , and 1000GPlons_hs38d1.xg ) to call SV from my short reads/ long reads (WGS) files. It seemslike that the vg command also require the corresponding .gcsa file (should be 1000GPlons_hs38d1.gcsa, but I could not find the available 1000GPlons_hs38d1.gcsa in https://cgl.gi.ucsc.edu/data/giraffe/mapping/graphs/for-NA19239/1000gplons/hs38d1/.) I am not sure whether my understanding is correct and my plan is doable. Thanks you again in advance

wzhang42 avatar Feb 14 '22 15:02 wzhang42

You need a GCSA index only for the old vg map aligner, which is slower than vg giraffe. You can find prebuilt graphs and indexes named 1000GPlons_hs38d1_filter.* and 1000GPlons_hs38d1_filter_forvgmap.*. They are almost the same as 1000GPlons_hs38d1.*, except that variants in long segmental duplications have been filtered out, as we found that it improves genotyping accuracy.

jltsiren avatar Feb 16 '22 05:02 jltsiren

Hi, jltsiren, Many thanks for your reply.

wzhang42 avatar Feb 16 '22 21:02 wzhang42