abPOA icon indicating copy to clipboard operation
abPOA copied to clipboard

Align expanded sequences, trim them and then compute the consensus sequence

Open AndreaGuarracino opened this issue 2 years ago • 6 comments

Hi Ya~~o~~n,

I've expanded few sequences with flanking nucleotides, then I've aligned them, obtaining the MSA and consensus sequence.

However, after aligning the expanded sequences, I would like to first delete the additional nucleotides and then calculate the consensus sequence with only the unexpanded sequences, without having to recalculate the whole alignment.

Is it possible?

AndreaGuarracino avatar Oct 11 '21 19:10 AndreaGuarracino

I don't fully get your idea. Do all the aligned sequences have flanking nucleotides? Or only a part of the sequences have?

Then, I guess what you want is to obtain the alignment/consensus for only a subset of all the input aligned sequences that have no flanking nucleotides, am I right?

Yan (I know Yao is very famous:) )

yangao07 avatar Oct 13 '21 08:10 yangao07

All the aligned sequences have flanking nucleotides.

Let me try with an example.

These are the original sequences.

example_abpoa.fa

>Sequence1
AAAAAAAATTTT
>Sequence2
TTGGTTTTTT

abpoa --match 1 --mismatch 9 --gap-open 16,41 --gap-ext 2,1 example_abpoa.fa -r 2

>Multiple_sequence_alignment
------AAAAAAAATTTT
TTGGTT--------TTTT
------AAAAAAAATTTT

Instead of aligning the original sequences, I am extending them with few flanking nucleotides upstream and downstream, and then aligning the expanded sequences. For example, adding 4 flanking nucleotides:

example_abpoa_padded.fa

>Sequence1
GGGGAAAAAAAATTTTGGGG
>Sequence2
TTTTTTGGTTTTTTGGGG

abpoa --match 1 --mismatch 9 --gap-open 16,41 --gap-ext 2,1 example_abpoa_padded.fa -r 2

>Multiple_sequence_alignment
------GGGGAAAAAAAATTTTGGGG
TTTTTTGG--------TTTTTTGGGG
TTTTTTGGGGAAAAAAAATTTTGGGG

Concerning the sequences, I am taking this alignment and removing the added flanking nucleotides. However, regarding the consensus sequence, I can't just trim the flanking nucleotides out (this leads to rare, but blocking bugs).

I would like to trim the expanded sequences in the abPOA graph (where the alignment was already computed with the expanded sequences) and then re-ask for the consensus sequence generation using the trimmed graph.

Yan (I know Yao is very famous:) )

LOL, mutations, you've got me!

AndreaGuarracino avatar Oct 13 '21 09:10 AndreaGuarracino

>Multiple_sequence_alignment
------GGGGAAAAAAAATTTTGGGG
TTTTTTGG--------TTTTTTGGGG
TTTTTTGGGGAAAAAAAATTTTGGGG

I guess, from this new graph, what you want is the consensus of the original (unexpanded) sequences, right? Then the graph should be transformed to:

>Multiple_sequence_alignment
----------AAAAAAAATTTT----
----TTGG--------TTTTTT----
>Consensus_sequence
----TTGG--AAAAAAAATTTT----

I think this is possible. We need to record all the flanking-base nodes and then manipulate their weight/base before calling the consensus:

  1. w = w-1, if w > 1
  2. set base as - if w == 1

Edit: this manipulation may not be correct, I need to think about it.

Then for the final consensus, we just need to remove all the - symbols.

yangao07 avatar Oct 13 '21 11:10 yangao07

Hi @yangao07, any update on this?

AndreaGuarracino avatar Mar 25 '22 15:03 AndreaGuarracino

See above, is that what you want?

yangao07 avatar Mar 28 '22 03:03 yangao07

I apologize, I didn't realize you were waiting for confirmation from me. Yes, I want the consensus of the original (unexpanded) sequences, as you wrote.

>Multiple_sequence_alignment
----------AAAAAAAATTTT----
----TTGG--------TTTTTT----
>Consensus_sequence
----TTGG--AAAAAAAATTTT---- <---- This one, generated from the trimmed (unexpanded) sequences

As I said, the consensus sequence can't be simply trimmed (this leads to rare bugs) but has to be generated starting from the original (unexpanded) sequences, using the alignment that we've got when we aligned the same sequences together with the flanking nucleotides.

After your considerations, do you think is it still doable?

AndreaGuarracino avatar Mar 28 '22 07:03 AndreaGuarracino