astarix
astarix copied to clipboard
GAF output
Could you provide GAF output over the nodes of the graph? Would it be possible to do? I could also submit a patch if you can describe how you'd go about setting this up.
It's defined in the minigraph documentation, and GraphAligner and vg both produce it.
It's like PAF but the target is expressed as a walk through nodes in the graph (your GFA input).
The walk is expressed like >1>2>4>6>7>9>10>11>12>13>14>16>17>19
If you have <19<17<16<14<13<12<11<10<9<7<6<4<2<1<
then it might represent the reverse complement of the above walk.
You'll need to create a CIGAR or md tag to express the base-level alignment, for use downstream in vg call
for instance.
Thank you for the feedback @ekg.
Yes, it is possible. Currently, the GFA coordinates information is not propagated to the internal graph representation. GAF is on my todo list and I can prioritize it now. Let me update this thread again the next days.
Is this feasible to do? I'd love to test astarix, and a standard output format like this is critical to do that!
Is this still on the agenda? GAF, or GAM outpost would be great to enable the use of the alignments in any downstream analysis.
Dear @ekg and @ChriKub, please excuse my delays. I was overoptimistic on implementing the GAF output in time but given my upcoming PhD defense, it may be delayed further.
In case someone wants to implement this sooner, there are several subtleties to be taken care of:
- The GFA input coordinates should be dragged along the whole way in the node_t structure (incl. the reverse-complement nodes).
- Any path starts at the trie (whose nodes are abstractions of nodes in GFA so they should not hold any GFA information). This means that to reconstruct the GFA coordinates at the beginning of the path, one has to go back in the graph instead of "climbing" the trie.
- Reverse complement nodes can be distinguished by their id.