racon icon indicating copy to clipboard operation
racon copied to clipboard

Add .gfa output to racon in addition to .fasta

Open jelber2 opened this issue 7 years ago • 25 comments

Hi, I was wondering if it were possible for racon to output a gfa file in addition to fasta?

jelber2 avatar Feb 09 '18 12:02 jelber2

Hello, what would be the use case of the outputed gfa format (i.e. what information are you seeking from racon)?

Best regards, Robert

rvaser avatar Feb 10 '18 14:02 rvaser

Well, I would like to input a gfa file from racon to the Hi-C scaffolding program SALSA (https://github.com/machinegun/SALSA). Granted I don't know if a gfa representation of the assembly would improve the output from SALSA or not.

My workflow is to take PacBio reads overlapped from minimap2 as input for miniasm to de novo assemble them and then call consensus with racon then input a gfa file from racon into SALSA then polish with pilon: minimap2->miniasm->racon->SALSA->pilon

But, I could alternatively do the following minimap2->miniasm->SALSA->racon->pilon

jelber2 avatar Feb 10 '18 15:02 jelber2

I have tagged this as enhancement and will deal with it soon.

Best regards, Robert

rvaser avatar Feb 11 '18 18:02 rvaser

I'd find this feature useful as well. I'm polishing a Miniasm assembly using Racon. It'd be useful to preserve the graph after polishing with Racon. Consider supporting both GFA 1 and GFA 2.

sjackman avatar Feb 25 '18 18:02 sjackman

How should I preserve the GFA file? Sequences change and alignments might be invalidated.

rvaser avatar Feb 25 '18 19:02 rvaser

The GFA 1 output by Miniasm includes estimates of the amount of overlap, but doesn't include an actual alignment. So I think you could get away with not modifying the edges at all. The edges output by Miniasm look like this:

L	utg000001l	+	utg001226l	+	19386M	SD:i:5467

After the sequences are corrected by Racon, you could realign the two sequences incident to each edge, and it's possible that some of the ambiguities in the graph could be resolved post-Racon.

sjackman avatar Feb 26 '18 01:02 sjackman

It is a bit tedious to add the format into Racon as we only need the S rows. Wouldn't a simple post-processing script be an easier solution? A script that updates the GFA file with polished sequences and maybe realigns edges?

rvaser avatar Feb 26 '18 07:02 rvaser

A post-processing script may be easiest. That script would take in the GFA file produced by Miniasm, the FASTA file produced by Racon, and produce an updated GFA file. Is that script something that you're interested in creating? Or perhaps a task for Gfakluge or GfaPy.

sjackman avatar Feb 26 '18 19:02 sjackman

Well I might add such a script but I am not sure when I will get the time for it :/

rvaser avatar Feb 26 '18 19:02 rvaser

No worries. I'll let you know if I get around to it myself.

sjackman avatar Feb 26 '18 19:02 sjackman

Great, thanks!

rvaser avatar Feb 26 '18 19:02 rvaser

Any updates?

mictadlo avatar Apr 16 '18 20:04 mictadlo

Not from me

sjackman avatar Apr 16 '18 21:04 sjackman

Neither from me :/

rvaser avatar Apr 16 '18 21:04 rvaser

I don't suppose anyone had a chance to look at this?

SamStudio8 avatar Sep 11 '18 11:09 SamStudio8

Unfortunately not :/ I'll try and deal with it later this year.

rvaser avatar Sep 11 '18 11:09 rvaser

I used this AWK script to take the sequence from polished.fasta the graph from draft.gfa and produce a polished.gfa file.

seqtk seq polished.fasta | gawk -vOFS='\t' 'ARGIND == 1 { id = substr($1, 2); getline; x[id] = $1; next } $1 == "S" && x[$2] { $3 = x[$2] } 1' - draft.gfa >polished.gfa

See also https://github.com/edawson/gfakluge and https://github.com/ggonnella/gfapy/ for manipulating GFA files. I'd still love to see this feature in Racon.

sjackman avatar Sep 11 '18 16:09 sjackman

so you basically taking the unpolished assembly graph and the new polished sequences and creating a polished graph? Am I correct?

MChiaraC avatar Apr 09 '19 08:04 MChiaraC

That is what I understand @sjackman's code is doing.

jelber2 avatar Apr 09 '19 08:04 jelber2

Yes. I'm working with an assembly graph whose edges are blunt (no overlap, 0M) from Flye or Unicycler. This simple script does not recompute the edge alignment for other assemblers.

sjackman avatar Apr 09 '19 15:04 sjackman

mmmh I see, than I cannot use it ...

MChiaraC avatar Apr 10 '19 06:04 MChiaraC

You could replace all the CIGAR strings with * (meaning unknown).

sjackman avatar Apr 10 '19 14:04 sjackman

Hello Robert Any progress or update to create .gfa output by Racon?

ardy20 avatar May 21 '21 11:05 ardy20

See https://github.com/rrwick/Minipolish

jelber2 avatar May 21 '21 11:05 jelber2

@ardy20, unfortunately no. Minipolish seems as a decent solution for this issue :)

rvaser avatar May 23 '21 03:05 rvaser