vg icon indicating copy to clipboard operation
vg copied to clipboard

vg autoindex-Correct input type not found while loading handlegraph::HandleGraph

Open Flavia95 opened this issue 1 year ago • 4 comments

Hi, I tried this command line, but I had this bug. How can I fix this?

vg1.40 autoindex -g mouse.parentals.fa.gz.9934a13.417fcdf.53439a3.smooth.final.gfa -t 20
 -p mouse.parentals.fa.gz.9934a13.417fcdf.53439a3.smooth.final.autoindex
[IndexRegistry]: Checking for haplotype lines in GFA.           
[vg autoindex] Executing command: vg1.40 autoindex -g mouse.parentals.fa.gz.9934a13.417fcdf.53439a3.smooth.final.gfa -t 20 -p mouse.parentals.fa.gz.9934a13.417fcdf.53439a3.smooth.final.autoindex
[IndexRegistry]: Constructing VG graph from GFA input.            
[IndexRegistry]: Constructing XG graph from VG graph.    
[IndexRegistry]: Pruning complex regions of VG to prepare for GCSA indexing.
[IndexRegistry]: Constructing GCSA/LCP indexes.                                                                                                                      
error[VPKG::load_one]: Correct input type not found while loading handlegraph::HandleGraph

Thank you, Flavia

Flavia95 avatar Jul 28 '22 18:07 Flavia95

Hmm. I'm not sure why that would happen. Are you able to share the GFA? You can email it to me at [email protected].

jeizenga avatar Jul 28 '22 18:07 jeizenga

I have a follow-up question. Do you know how much disk space and memory you had available when you ran into this problem?

jeizenga avatar Aug 03 '22 16:08 jeizenga

Yes, this is the situation.

Size  Used Avail Use
63G   4.2G   56G           

Flavia95 avatar Aug 08 '22 18:08 Flavia95

I suspect the problem is that you ran out of temporary storage space on your disk while constructing the GCSA2 index. The final GCSA2 is typically < 20 GB, but the indexing process can use quite a bit more than that in temporary storage. I'm not sure why vg autoindex tried to proceed without a finished index though.

In this particular case, you may have also run up against some limitations in the way we select pruning parameters (a simplification step that precedes GCSA2 indexing). When I ran the vg autoindex pipeline with the same inputs on a very large machine,autoindex eventually ran up against the software-defined 2 TB limit on temporary disk usage and aborted. Running up against this limit typically means that the graph was insufficiently pruned, and we've run up against the GCSA2's worst-case exponential space usage.

I do not currently have a good way to select the pruning parameters automatically, although I'm trying to get some discussions started over here on how we might do so. In the meantime, I'm afraid you'll probably need to use the more laborious manual indexing pipeline.

jeizenga avatar Aug 12 '22 23:08 jeizenga