BioAlignments.jl icon indicating copy to clipboard operation
BioAlignments.jl copied to clipboard

Inconsistent when aligning distinct sequence types

Open jakobnissen opened this issue 2 years ago • 1 comments

So I was surprised to find you can align Strings to each other:

julia> x= pairalign(LocalAlignment(), "ACA", "AAA", model)
PairwiseAlignmentResult{Int64, String, String}:
  score: 6
  seq: 1 ACA 3
         | |
  ref: 1 AAA 3

It's kind of cool that all it needs is a sequence of elements that implements convert(T, x) to the right type. But when displaying the sequence, it does not recognize that DNA_A == 'A'.

julia> x= pairalign(LocalAlignment(), "ACA", dna"AAA", model)
PairwiseAlignmentResult{Int64, String, LongDNASeq}:
  score: 6
  seq: 1 ACA 3

  ref: 1 AAA 3

We should also think about how to handle alignments of distinct sequence types. For example, how do you align to RNA sequences to each other? There is no substitution model, though obviously the DNA models could work. But since Strings are allowed to be used, we have an inconsistency: pairalign(LocalAlignment(), "AUA", dna"AAA", model) errors, but pairalign(LocalAlignment(), rna"AUA", dna"AAA", model) works, simply because convert(DNA, 'U') is an error, whereas convert(DNA, RNA_U) isn't.

jakobnissen avatar Jul 19 '21 07:07 jakobnissen

Aligning RNA is a good question. DNA model is probably okay for mRNA, but I wonder if it would work as well for structured RNAs?

BioTurboNick avatar Jul 19 '21 18:07 BioTurboNick