gfatools icon indicating copy to clipboard operation
gfatools copied to clipboard

Using `gfatools asm`

Open aafshinfard opened this issue 2 years ago • 0 comments

Hi Thank you for your work on gfatools, I have an assembly graph that I want to process with gfatools asm (to simplify the graph, like popping the bubbles, etc) and output the scaffolds. The graph is based on draft assembly contigs and some connections I inferred based on long reads, either edges or gaps based on if the estimated distance/gap size is negative or positive. I wanted to get some information/advice on what I should provide gfatools asm with to get the best out of it. Like in my analysis, each connection (edge or gap) is based on supporting long reads so I have weights (number of supports) for the connections that may be useful, and I format that in the tag currently (FC:i: / see the example below) but not sure if gfatools will consider that? I also have the gap size estimates that I wanted to output on G-lines for both gaps and edges (as I don't have alignments but only gap estimates for overlapping contigs too), but I found out negative distances, for overlapping contigs, are not allowed on G-lines, so I will have to somehow format that as an E-line? if so, then are the start/end positions important for the analysis or I can put some fake values? Toy example:

H	VN:Z:2.0
graph [scaf_num=None]
S	1	49057	*
S	2	33803	*
S	3	22222	*
G	*	2-	1-	3340	*	FC:i:20
# 20 reads support the above gap and the gap size is 334
G	*	1+	3-	4000	*	FC:i:6
# 6 reads support the above gap and the gap size is 400
G	*	1-	2-	-300	        *	FC:i:15
# 15 reads support this connection and the contigs overlap by 300 bp, but this seems like an invalid G-line and should probably be converted to an E-line? 

And lastly: is there any additional information that gfatools can benefit from? I can potentially prepare and provide those too.

aafshinfard avatar Feb 17 '23 19:02 aafshinfard