GraphAligner icon indicating copy to clipboard operation
GraphAligner copied to clipboard

Mapping long reads to whole genome

Open champ1994 opened this issue 3 years ago • 2 comments

I am trying to map long reads to whole genome. I have graph of each chromosomes separately built using vg construct. How can I map the reads to all the chromosomes and find out to which chromosome the read is mapped to?

The trivial solution is to map the read to each chromosome's vg graph and find from all gaf file which has highest alignment score for that particular read.

Is there a simpler way to do this ?

champ1994 avatar May 11 '22 08:05 champ1994

You can merge the individual chromosome graphs into one graph which has all chromosomes and then align the reads to that graph. GraphAligner will then align to all chromosomes and pick the best alignment, or multiple alignments if their alignment scores are similar.

maickrau avatar May 11 '22 10:05 maickrau

Thank You for the prompt reply.

In the vg wiki link : https://github.com/vgteam/vg/wiki/Working-with-a-whole-genome-variation-graph#node-id-coordination , vg ids only changes the node id of each chr. vg file and a combined graph is produced for .xg index. But the graphAligner takes .vg file as input. So, do you have any idea how to combine these graph which produces a .vg file ?

I am sorry, this question should be directed to the vg team. But if you have any idea it would be helpful.

Edit: I was able to figure out how to combine the vg using vg combine module, and mapping to the whole genome was done successfully. In the resulting gaf file, for individual reads, chromosome info can only be inferred by node id number present in the Path Matching Column. In the GAF link : https://github.com/lh3/gfatools/blob/master/doc/rGFA.md#the-graph-alignment-format-gaf , it is mentioned that converting it to stable co-ordinate, then the chromosome info will also be present in GAF file.

champ1994 avatar May 11 '22 11:05 champ1994