gfatools
gfatools copied to clipboard
Using `gfatools asm`
Hi
Thank you for your work on gfatools
,
I have an assembly graph that I want to process with gfatools asm
(to simplify the graph, like popping the bubbles, etc) and output the scaffolds.
The graph is based on draft assembly contigs and some connections I inferred based on long reads, either edges or gaps based on if the estimated distance/gap size is negative or positive.
I wanted to get some information/advice on what I should provide gfatools asm
with to get the best out of it. Like in my analysis, each connection (edge or gap) is based on supporting long reads so I have weights (number of supports) for the connections that may be useful, and I format that in the tag currently (FC:i:gfatools
will consider that?
I also have the gap size estimates that I wanted to output on G-lines for both gaps and edges (as I don't have alignments but only gap estimates for overlapping contigs too), but I found out negative distances, for overlapping contigs, are not allowed on G-lines, so I will have to somehow format that as an E-line? if so, then are the start/end positions important for the analysis or I can put some fake values?
Toy example:
H VN:Z:2.0
graph [scaf_num=None]
S 1 49057 *
S 2 33803 *
S 3 22222 *
G * 2- 1- 3340 * FC:i:20
# 20 reads support the above gap and the gap size is 334
G * 1+ 3- 4000 * FC:i:6
# 6 reads support the above gap and the gap size is 400
G * 1- 2- -300 * FC:i:15
# 15 reads support this connection and the contigs overlap by 300 bp, but this seems like an invalid G-line and should probably be converted to an E-line?
And lastly: is there any additional information that gfatools
can benefit from? I can potentially prepare and provide those too.