vg
vg copied to clipboard
vg index command was killed
Hi, Sir, I download the graph file 1000GPlons_hs38d1.vg from https://cgl.gi.ucsc.edu/data/giraffe/mapping/graphs/for-NA19239/1000gplons/hs38d1/. Then, I use "vg index -x 1000GPlons_hs38d1.xg -g 1000GPlons_hs38d1.gcsa -k 16 1000GPlons_hs38d1.vg", then it was killed without any message. But I use the small test dataset , "vg index -x x.xg -g x.gcsa -k 16 x.vg" , it's ok. What I should do.? Thank you in advance.
GCSA construction requires large amounts of memory and temporary disk space. The 1000GP graphs are complex enough that you have to prune them before GCSA construction is even possible. See index construction in the vg wiki for further information. Also, GCSA construction works better with each chromosome in a separate graph file instead of having a single large graph.
If you want to build the indexes for vg map
, you can use vg autoindex
with the appropriate input for that. The is some documentation in the wiki for that as well.
Hi, jltsiren, Thanks for your reply. Essentially, I am interested in using the prebuilt graph ( hopefully to use 1000GPlons_hs38d1.vg , and 1000GPlons_hs38d1.xg ) to call SV from my short reads/ long reads (WGS) files. It seemslike that the vg command also require the corresponding .gcsa file (should be 1000GPlons_hs38d1.gcsa, but I could not find the available 1000GPlons_hs38d1.gcsa in https://cgl.gi.ucsc.edu/data/giraffe/mapping/graphs/for-NA19239/1000gplons/hs38d1/.) I am not sure whether my understanding is correct and my plan is doable. Thanks you again in advance
You need a GCSA index only for the old vg map
aligner, which is slower than vg giraffe
. You can find prebuilt graphs and indexes named 1000GPlons_hs38d1_filter.*
and 1000GPlons_hs38d1_filter_forvgmap.*
. They are almost the same as 1000GPlons_hs38d1.*
, except that variants in long segmental duplications have been filtered out, as we found that it improves genotyping accuracy.
Hi, jltsiren, Many thanks for your reply.