astarix icon indicating copy to clipboard operation
astarix copied to clipboard

GAF output

Open ekg opened this issue 2 years ago • 4 comments

Could you provide GAF output over the nodes of the graph? Would it be possible to do? I could also submit a patch if you can describe how you'd go about setting this up.

It's defined in the minigraph documentation, and GraphAligner and vg both produce it.

It's like PAF but the target is expressed as a walk through nodes in the graph (your GFA input).

The walk is expressed like >1>2>4>6>7>9>10>11>12>13>14>16>17>19

If you have <19<17<16<14<13<12<11<10<9<7<6<4<2<1< then it might represent the reverse complement of the above walk.

You'll need to create a CIGAR or md tag to express the base-level alignment, for use downstream in vg call for instance.

ekg avatar Dec 14 '21 17:12 ekg

Thank you for the feedback @ekg.

Yes, it is possible. Currently, the GFA coordinates information is not propagated to the internal graph representation. GAF is on my todo list and I can prioritize it now. Let me update this thread again the next days.

pesho-ivanov avatar Dec 16 '21 10:12 pesho-ivanov

Is this feasible to do? I'd love to test astarix, and a standard output format like this is critical to do that!

ekg avatar Jan 20 '22 16:01 ekg

Is this still on the agenda? GAF, or GAM outpost would be great to enable the use of the alignments in any downstream analysis.

ChriKub avatar May 09 '22 09:05 ChriKub

Dear @ekg and @ChriKub, please excuse my delays. I was overoptimistic on implementing the GAF output in time but given my upcoming PhD defense, it may be delayed further.

In case someone wants to implement this sooner, there are several subtleties to be taken care of:

  1. The GFA input coordinates should be dragged along the whole way in the node_t structure (incl. the reverse-complement nodes).
  2. Any path starts at the trie (whose nodes are abstractions of nodes in GFA so they should not hold any GFA information). This means that to reconstruct the GFA coordinates at the beginning of the path, one has to go back in the graph instead of "climbing" the trie.
  3. Reverse complement nodes can be distinguished by their id.

pesho-ivanov avatar May 16 '22 18:05 pesho-ivanov