`duplicate rank` error in `vg find` when extracting a graph subregion
1. What were you trying to do? I was trying to extract a subregion from a graph encoded in a GFA format file.
2. What did you want to happen? I want to obtain a GFA format file with the graph subregion plus the paths' portions covering that subregion.
3. What actually happened?
I get this error
error[load_proto_to_graph]: duplicate rank 5944 in path hg38_chr2
4. If you got a line like Stack trace path: /somewhere/on/your/computer/stacktrace.txt, please copy-paste the contents of that file here: -
5. What data and command can the vg dev team use to make the problem happen?
input=hppy1+chm13+h38-chr2.fa.gz.pggb-s4000-l12000-p98-n6-a0-K16-k29-w180000-j10000-e10000-I0.5-R0.2.smooth.renamed.gfa
path_ref=hg38_chr2
vg convert --gfa-in $input -x > $input.xg
vg find -p ${path_ref}:91597913-96188605 -x $input.xg > $input.${path_ref}_91597913_96188605.vg
vg view $input.${path_ref}_91597913_96188605.vg > $input.${path_ref}_91597913_96188605.gfa
Here the input used.
6. What does running vg version say?
vg: variation graph tool, version v1.30.0 "Carentino"
I'd suggest 2 things:
- specifying a context size with
-cto pull in nodes adjacent to your path. it defaults to 0 in vg find which probably won't give what you want - using
vg chunkinstead ofvg find. (the relevant options remain the same).vg chunkis more careful to not make path fragments that will cause trouble downstream
Thank you @glennhickey! Using -c with vg find didn't work, but using vg chunk plus -c I was able to get the graph's chunk.
Does it mean that vg find is going to be deprecated?
Is there any update on this issue?
I'm having the same problem (except that I'm using -N instead of -p for the selection) and unfortunately I cannot use vg chunk as I'm trying to extract a list of nodes (with context) and vg chunk will extract each node into a separate chunk (I want a single graph without duplication and containing the paths).