pysamstats
pysamstats copied to clipboard
Consensus variation stats for paired end?
Is it currently possible in pysamstats to generate consensus variation stats for paired end reads? As far as I can tell if a mismatch is on both reads for a given position, it gets counted twice.
Hi Andrew, I'm not sure I understand what you mean. Could you elaborate?
On Wednesday, January 18, 2017, Andrew Stewart [email protected] wrote:
Is it currently possible in pysamstats to generate consensus variation stats for paired end reads? As far as I can tell if a mismatch is on both reads for a given position, it gets counted twice.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/alimanfoo/pysamstats/issues/60, or mute the thread https://github.com/notifications/unsubscribe-auth/AAq8Qlk8DNEWuPvAZ1xJsOutKaQzhNmaks5rToGrgaJpZM4LnZ1E .
-- Alistair Miles Head of Epidemiological Informatics Centre for Genomics and Global Health http://cggh.org The Wellcome Trust Centre for Human Genetics Roosevelt Drive Oxford OX3 7BN United Kingdom Email: [email protected] Web: http://purl.org/net/aliman Twitter: https://twitter.com/alimanfoo Tel: +44 (0)1865 287721
Sure. As far as I can tell, it looks like pysamstats considers the same insertion/deletion/mismatch observed on both forward and reverse reads at the same position as 2 separate observations. For example, in the following alignment...
fwd: ====C====
rev: ====G====
ref: ACGTATGCA
...pysamstats would report:
-
mismatches
: 2 -
mismatches_fwd
: 1 -
mismatches_rev
: 1
Biologically, one might want to interpret this as a single event. I'm just wondering if there exists an option in pysamstats to do that.
Hi Andrew, no there is no option currently to generate a consensus sequence from variation stats. The variation stats count the total number of reads supporting a mismatch, indel etc. at each base. Is it just a consensus sequence you'd like to generate?
On Mon, Jan 23, 2017 at 7:30 PM, Andrew Stewart [email protected] wrote:
Sure. As far as I can tell, it looks like pysamstats considers the same insertion/deletion/mismatch observed on both forward and reverse reads at the same position as 2 separate observations. For example, in the following alignment...
fwd: ====C==== rev: ====G====
ref: ACGTATGCA
...pysamstats would report:
- mismatches: 2
- mismatches_fwd: 1
- mismatches_rev: 1
Biologically, one might want to interpret this as a single event. I'm just wondering if there exists an option in pysamstats to do that.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/alimanfoo/pysamstats/issues/60#issuecomment-274591997, or mute the thread https://github.com/notifications/unsubscribe-auth/AAq8QjTx30mbrSvasdNQDSmzTImV1X4Sks5rVP_ngaJpZM4LnZ1E .
-- Alistair Miles Head of Epidemiological Informatics Centre for Genomics and Global Health http://cggh.org The Wellcome Trust Centre for Human Genetics Roosevelt Drive Oxford OX3 7BN United Kingdom Email: [email protected] Web: http://purl.org/net/aliman Twitter: https://twitter.com/alimanfoo Tel: +44 (0)1865 287721
Not the consensus sequence, Alistair, just the consensus alignment stats. If you want to point to the relevant lines of code where something like that might best fit, I can try to take a stab at adding it as a new output (variation_consensus?).
On Tue, Jan 24, 2017 at 6:58 AM Alistair Miles [email protected] wrote:
Hi Andrew, no there is no option currently to generate a consensus sequence from variation stats. The variation stats count the total number of reads supporting a mismatch, indel etc. at each base. Is it just a consensus sequence you'd like to generate?
On Mon, Jan 23, 2017 at 7:30 PM, Andrew Stewart [email protected] wrote:
Sure. As far as I can tell, it looks like pysamstats considers the same insertion/deletion/mismatch observed on both forward and reverse reads at the same position as 2 separate observations. For example, in the following alignment...
fwd: ====C==== rev: ====G====
ref: ACGTATGCA
...pysamstats would report:
- mismatches: 2
- mismatches_fwd: 1
- mismatches_rev: 1
Biologically, one might want to interpret this as a single event. I'm just wondering if there exists an option in pysamstats to do that.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub < https://github.com/alimanfoo/pysamstats/issues/60#issuecomment-274591997>, or mute the thread < https://github.com/notifications/unsubscribe-auth/AAq8QjTx30mbrSvasdNQDSmzTImV1X4Sks5rVP_ngaJpZM4LnZ1E
.
-- Alistair Miles Head of Epidemiological Informatics Centre for Genomics and Global Health http://cggh.org The Wellcome Trust Centre for Human Genetics Roosevelt Drive Oxford OX3 7BN United Kingdom Email: [email protected] Web: http://purl.org/net/aliman Twitter: https://twitter.com/alimanfoo Tel: +44 (0)1865 287721 <+44%201865%20287721>
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/alimanfoo/pysamstats/issues/60#issuecomment-274827608, or mute the thread https://github.com/notifications/unsubscribe-auth/ANnwFISuDw8Q24fns4P2jPOPCt7-9YFQks5rVhGvgaJpZM4LnZ1E .
Hi Andrew,
Apologies, I am being dim and still not getting it. Could you walk me through a complete example? E.g., say a site has reference nucleotide "A", in the pileup there are 2 reads supporting the reference nucleotide, 3 reads supporting a mismatching nucleotide "C", 4 reads supporting a 2 bp insertion, and 5 reads supporting a 3 bp deletion (a bit crazy but not impossible). What would be the output from variation_consensus at this site?
The implementation of variation statistics starts from here: https://github.com/alimanfoo/pysamstats/blob/master/pysamstats.pyx#L684. Please feel free to submit a PR, and shout if you need any info on compiling or running the tests.
Cheers, Alistair
On Tuesday, January 31, 2017, Andrew Stewart [email protected] wrote:
Not the consensus sequence, Alistair, just the consensus alignment stats. If you want to point to the relevant lines of code where something like that might best fit, I can try to take a stab at adding it as a new output (variation_consensus?).
On Tue, Jan 24, 2017 at 6:58 AM Alistair Miles <[email protected] javascript:_e(%7B%7D,'cvml','[email protected]');> wrote:
Hi Andrew, no there is no option currently to generate a consensus sequence from variation stats. The variation stats count the total number of reads supporting a mismatch, indel etc. at each base. Is it just a consensus sequence you'd like to generate?
On Mon, Jan 23, 2017 at 7:30 PM, Andrew Stewart < [email protected] javascript:_e(%7B%7D,'cvml','[email protected]');> wrote:
Sure. As far as I can tell, it looks like pysamstats considers the same insertion/deletion/mismatch observed on both forward and reverse reads at the same position as 2 separate observations. For example, in the following alignment...
fwd: ====C==== rev: ====G====
ref: ACGTATGCA
...pysamstats would report:
- mismatches: 2
- mismatches_fwd: 1
- mismatches_rev: 1
Biologically, one might want to interpret this as a single event. I'm just wondering if there exists an option in pysamstats to do that.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub < https://github.com/alimanfoo/pysamstats/issues/60#issuecomment-274591997 , or mute the thread < https://github.com/notifications/unsubscribe-auth/ AAq8QjTx30mbrSvasdNQDSmzTImV1X4Sks5rVP_ngaJpZM4LnZ1E
.
-- Alistair Miles Head of Epidemiological Informatics Centre for Genomics and Global Health http://cggh.org The Wellcome Trust Centre for Human Genetics Roosevelt Drive Oxford OX3 7BN United Kingdom Email: [email protected] javascript:_e(%7B%7D,'cvml','[email protected]'); Web: http://purl.org/net/aliman Twitter: https://twitter.com/alimanfoo Tel: +44 (0)1865 287721 <+44%201865%20287721>
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <https://github.com/alimanfoo/pysamstats/issues/60# issuecomment-274827608>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ ANnwFISuDw8Q24fns4P2jPOPCt7-9YFQks5rVhGvgaJpZM4LnZ1E> .
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/alimanfoo/pysamstats/issues/60#issuecomment-276519282, or mute the thread https://github.com/notifications/unsubscribe-auth/AAq8Qrg8xaedZGGYduw13w4XWfmkbfDEks5rX7rhgaJpZM4LnZ1E .
-- Alistair Miles Head of Epidemiological Informatics Centre for Genomics and Global Health http://cggh.org The Wellcome Trust Centre for Human Genetics Roosevelt Drive Oxford OX3 7BN United Kingdom Email: [email protected] Web: http://purl.org/net/aliman Twitter: https://twitter.com/alimanfoo Tel: +44 (0)1865 287721