BioFSharp icon indicating copy to clipboard operation
BioFSharp copied to clipboard

[Feature Request] Rework alignment

Open HLWeil opened this issue 5 years ago • 2 comments

The Problem

Using the Pairwise alignment in BioFSharp.Algorithms works fine but the only implemented way to write out this alignment in a correct format is in the BioFSharp.IO.Clustal module. Although both generally use the same BioFSharp.Alignment.Alignment type, the conversion can be quite cumbersome.

Solution

Remodel BioFSharp.Algorithms.Pairwise Alignment and BioFSharp.IO.Clustal

  • [ ] Add ConservationInfo module to BioFSharp.IO.Clustal or BioFSharp.Alignment

  • [ ] Let Clustal functions use BioSeqs instead of Strings

  • [ ] Let BioFSharp.Algorithms.PairwiseAlignment functions use BioSeqs as output instead of Nucleotides

  • [x] Add create function to Alignment Type in BioFSharp.Alignment

These changes should make using the different alignment functions of different namespaces together easier.

Example of unnecessary conversions

Output type of alignment
 Alignment.Alignment<Nucleotides.Nucleotide list, Algorithm.PairwiseAlignment.Score>
Expected input of clustal write function
 Alignment.Alignment<BioID.TaggedSequence<string,char>,Clustal.AlignmentInfo>
Needed Conversion
 let mappedData = 
     alignment.AlignedSequences
     |> List.mapi (fun i (ns:Nucleotides.Nucleotide list) -> 
         Seq.map (BioItem.symbol) ns
         |> BioID.createTaggedSequence (sprintf "seq%i" i)
     )

 let conservationInfo = String.init firstGeneSeq.Length (fun _ -> "*")

 let newHeader = {Header = "Decoy";ConservationInfo = conservationInfo}

 let newAlignment = {MetaData = newHeader;AlignedSequences = mappedData}

which is very cumbersome

HLWeil avatar Jul 23 '19 14:07 HLWeil

@HLWeil any updates?

kMutagene avatar Feb 19 '20 15:02 kMutagene

Actually there are multiple types that more or less look very similar:

type TaggedSequence<'T,'S> =
    {
        Tag: 'T;
        Sequence: seq<'S>
    }
type FastaItem<'a> = {
    Header    : string;
    Sequence  : 'a;       
}
///General Alignment type used throughout BioFSharp
type Alignment<'Sequence,'Metadata> =                
    {
    ///Additional information for this alignment
    MetaData            : 'Metadata;
    ///List of aligned Sequences
    Sequences    : seq<'Sequence>;
    }

Replacing the Alignment type with the TaggedSequence might actually cause conciseness loss, but I think in general it would be good if these types had seamless interop. Also the FastaItem type might actually be replacable with the TaggedSequence type with some minor adjustments.

What do you think?

HLWeil avatar May 11 '21 16:05 HLWeil