vg icon indicating copy to clipboard operation
vg copied to clipboard

How to handle non-nucleotide characters?

Open eboileau opened this issue 6 months ago • 0 comments

I have another request, not a bug, but I'd really appreciate your help. I am interested in pantranscriptome graphs, but not exactly for haplotypes per se. I would like to represent spliced graphs and include information about RNA modifications, something like this

Image

I naively thought I could use modifications as ALT alleles, but vg construct only allows A, C, T, G, and N (though with N, haplotype-specific transcripts are missing). This is somehow related to #1677.

I don't expect you to take this over, but I don't want to go into a rabbit hole. I must be honest, I'm only barely familiar with C++, and I have barely peeped through the code... do you think it is feasible to fork and work on this, how much of vg actually assumes that only A, C, T, G are used?

Another option, which I don't like so much, would be to work with standard nucleotides, and "patch" the final output before feeding it to sequenceTubeMap. Again, I'm not sure how easy this would be, and if sequenceTubeMap would even allow it? Any advice would be greatly appreciated.

Thanks.

eboileau avatar Jun 25 '25 07:06 eboileau