seqan icon indicating copy to clipboard operation
seqan copied to clipboard

question about parallel alignment of different seqence

Open metaphysicser opened this issue 1 year ago • 1 comments

Hello, I would like to use the following code provided by you to perform parallel pairwise sequence alignment on a large number of different sequences.

#include <iostream>

#include <seqan/align_parallel.h>
#include <seqan/stream.h>  // for printing strings

int main()
{
    using TSequence = seqan2::String<seqan2::Dna>;
    using TAlignedSequence = seqan2::Gaps<TSequence>;
    using TThreadModel = seqan2::Parallel;
    using TVectorSpec = seqan2::Vectorial;
    using TExecPolicy = seqan2::ExecutionPolicy<TThreadModel, TVectorSpec>;

    // dummy sequences
    TSequence seqH = "CGATT";
    TSequence seqV = "CGAAATT";

    seqan2::StringSet<TAlignedSequence> seqs1;
    seqan2::StringSet<TAlignedSequence> seqs2;

    for (size_t i = 0; i < 100; ++i) // Create a data set of 100 dummy sequences
    {
        appendValue(seqs1, TAlignedSequence(seqH));
        appendValue(seqs2, TAlignedSequence(seqV));
    }

    TExecPolicy execPolicy;
    setNumThreads(execPolicy, 4);

    seqan2::Score<int16_t, seqan2::Simple> scoreAffine(2, -2, -1, -4);

    seqan2::String<int16_t> scores = seqan2::globalAlignment(execPolicy, seqs1, seqs2, scoreAffine);

    for (int16_t score : scores)
        std::cout << "Score: " << score << "\n";

    for (size_t pos = 0; pos < seqan2::length(seqs1); ++pos) // print out alignments
    {
        std::cout << seqs1[pos] << "\n";
        std::cout << seqs2[pos] << "\n\n";
    }

    return EXIT_SUCCESS;
}

Due to my lack of understanding of SeqAn, I attempted many times and eventually found that the following code meets my requirements.

#include <seqan/align_parallel.h>
#include <seqan/stream.h>  // for printint strings
// Function to generate a random DNA sequence of given length
std::string generateRandomDNASequence(int length) {
    std::string bases = "ATCG";
    std::string sequence = "";
    for (int i = 0; i < length; ++i) {
        sequence += bases[rand() % 4]; // Randomly pick a base and append to the sequence
    }
    return sequence;
}

int main(int argc, char** argv) {
    using TSequence = seqan2::String<seqan2::Dna>;
    using TAlignedSequence = seqan2::Gaps<TSequence>;
    using TThreadModel = seqan2::Parallel;
    using TVectorSpec = seqan2::Vectorial;
    using TExecPolicy = seqan2::ExecutionPolicy<TThreadModel, TVectorSpec>;

   
    seqan2::StringSet<TAlignedSequence> seqs1;
    seqan2::StringSet<TAlignedSequence> seqs2;

    for (size_t i = 0; i < 100; ++i) // Create a data set of 100 dummy sequences
    {
        appendValue(seqs1, TAlignedSequence(*new TSequence() = generateRandomDNASequence(20)));
        appendValue(seqs2, TAlignedSequence(*new TSequence() = generateRandomDNASequence(20)));
    }

    TExecPolicy execPolicy;
    setNumThreads(execPolicy, 4);

    seqan2::Score<int16_t, seqan2::Simple> scoreAffine(2, -2, -1, -4);

    seqan2::String<int16_t> scores = seqan2::globalAlignment(execPolicy, seqs1, seqs2, scoreAffine);

    for (int16_t score : scores)
        std::cout << "Score: " << score << "\n";

    for (size_t pos = 0; pos < seqan2::length(seqs1); ++pos) // print out alignments
    {
        std::cout << seqs1[pos] << "\n";
        std::cout << seqs2[pos] << "\n\n";
    }

    return EXIT_SUCCESS;
}

However, I am also facing the issue of memory leaks. I don't know how to free the memory I've allocated. I tried using delete, but it failed.

Could you please tell me the correct way to add different sequences to a seqan2::StringSet and perform the alignment? Alternatively, how can I free the memory allocated for TSequence? Maybe this question seems simple, but it's really important to me! Thank you!

metaphysicser avatar Mar 07 '24 15:03 metaphysicser

Hi @metaphysicser

The class seqan2::Gaps<TSequence> doesn't hold the sequence itself, but only decorations around a sequence. The sequence itself must be stored such that it outlives the seqan2::Gaps instances.

You could change your code as following to achieve this. It changes seqs1 and seqs2 to store the actual sequences. And seqsWithGaps1 and seqsWithGaps2 store the decoration around the sequences.

    seqan2::StringSet<TSequence> seqs1;
    seqan2::StringSet<TSequence> seqs2;
    for (size_t i = 0; i < 100; ++i) // Create a data set of 100 dummy sequences
    {
        appendValue(seqs1, generateRandomDNASequence(20));
        appendValue(seqs2, generateRandomDNASequence(20));
    }

    seqan2::StringSet<TAlignedSequence> seqsWithGaps1;
    seqan2::StringSet<TAlignedSequence> seqsWithGaps2;
    for (size_t i = 0; i < 100; ++i) // Create wrapper around the sequences
    {
        appendValue(seqsWithGaps1, TAlignedSequence(seqs1[i]));
        appendValue(seqsWithGaps2, TAlignedSequence(seqs2[i]));
    }

    TExecPolicy execPolicy;
    setNumThreads(execPolicy, 4);

    seqan2::Score<int16_t, seqan2::Simple> scoreAffine(2, -2, -1, -4);

    seqan2::String<int16_t> scores = seqan2::globalAlignment(execPolicy, seqsWithGaps1, seqsWithGaps2, scoreAffine);

SGSSGene avatar Mar 11 '24 06:03 SGSSGene

@metaphysicser I hope your question could be resolved. If not please feel free to reopen this ticket any time. I will close it for now, since no new updates have been received. Thank you.

rrahn avatar Jun 03 '24 10:06 rrahn