ggseqlogo icon indicating copy to clipboard operation
ggseqlogo copied to clipboard

Glitch or Feature? Either methods do not represent the true distribution of an allele when gaps are involved

Open moa4020 opened this issue 1 year ago • 3 comments

Hello! I use ggseqlogo to produce logoplots for HIV variants. I recently have come across an inconsistency that I have questions about.

Here is an explanation to the issue I have: In Donor#X, a Gap in site 26 is not reflected on the logoplot. ggseqlogo shows a gap in that site only if all the sequences in that position have a gap. Or else it gives a 100% probability to an allele even if it's present in just one sequence. This was seen in both

Here is an Image I have attached of the logoplot and the consensus matrix. Please see sites 26 and 27.

The A at 26 is full sized even though it is present only in 1 sequence out of 18 total sequences. The rest of them have a "-".

Similarly, I expected site 27s G to be a little smaller due to the fact that it was seen only in 17/18 sequences.

I got the same results even with the "Bits" method.

Screenshot 2023-03-14 at 2 03 10 AM methods.

moa4020 avatar Mar 14 '23 06:03 moa4020

Can you share example data for me to be able to fully reproduce?

omarwagih avatar Mar 14 '23 15:03 omarwagih

OUT.zip

I have included the aligned fasta sequences and my consensus matrix and logolot for reference.

moa4020 avatar Mar 14 '23 19:03 moa4020

Hello! Just following up to see if there are any insights to this issue.

moa4020 avatar Mar 28 '23 17:03 moa4020