biojava icon indicating copy to clipboard operation
biojava copied to clipboard

Harmonize Profile and LightweightProfile

Open sbliven opened this issue 4 years ago • 0 comments

I think Profile was indended to be general enough to represent multiple sequence alignments. However SimpleProfile is the only current implementation and it only supports pairwise alignments. There is also LightweightProfile with similarly named methods that is implemented by MultipleSequenceAlignment and seems intended for ungapped sequences. Is that an accurate summary of the situation?

If so, I propose steps to

  • Make Profile<S,C> extend LightweightProfile<AlignedSequence<S,C>,C>
  • Improve documentation clarifying that SimpleProfile is pairwise
  • Deprecate one of Profile.StringFormat or LightweightProfile.StringFormat
  • Add constructors to ease conversion between SimpleProfile and MultipleSequenceAlignment instances
  • Document what toString(StringFormat) is expected to produce for each format. Ideally it should be the same for two or more sequences. If a format is only applicable to pairwise alignments this needs to be clearly documented.

The last point regarding formats is relevant to #983. I had difficulties recommending a solution because the pairwise and MSA outputs are different and undocumented.

sbliven avatar Jan 27 '22 15:01 sbliven