MosaicForecast icon indicating copy to clipboard operation
MosaicForecast copied to clipboard

Need help for quality filtering of indels

Open jekim2022 opened this issue 1 year ago • 2 comments

In the article, I could find the Variant calling methods of SNVs and indels separately. I found you called SNVs using SAMtools mpileup with mapping quality >20 and base quality >20, but I couldn't find any mapping quality or base quality filter conditions in the calling method of indels.

So I am asking if the calling of indels doesn't need mapping quality or base quality filters, and why it doesn't need unlike SNVs.

Thank you in advance

jekim2022 avatar Jun 30 '23 09:06 jekim2022

Hi @jekim2022 ,

Thanks for your question and sorry for the delayed response. While calling indels (in the "ReadLevel_Features_extraction.py") I also used baseQ-related features and mapQ-related features (for example, "mapq_p", "baseq_p","mapq_difference", "ref_baseq1b_p", etc.).

In the pre-scan stage, for SNVs baseQ<20 or mapQ<20 are typically used to filter low-quality reads, but for indels it's not so direct to calculate baseQs of alternative alleles (i.e., for deletions, you cannot read the "baseQ" of mutant alleles from the bam file directly). You can definitely pre-filter reads with low mapQ if you would want to.

Best wishes,

Y.

douym avatar Aug 25 '23 07:08 douym

Dear Y, I express my gratitude for your response, and your insights have indeed illuminated the path forward. Your assistance is deeply appreciated. I hope everything you do goes well.

Best, Jieun Kim

2023년 8월 25일 (금) 오후 4:37, douym @.***>님이 작성:

Hi @jekim2022 https://github.com/jekim2022 ,

Thanks for your question and sorry for the delayed response. While calling indels (in the "ReadLevel_Features_extraction.py") I also used baseQ-related features and mapQ-related features (for example, "mapq_p", "baseq_p","mapq_difference", "ref_baseq1b_p", etc.).

In the pre-scan stage, for SNVs baseQ<20 or mapQ<20 are typically used to filter low-quality reads, but for indels it's not so direct to calculate baseQs of alternative alleles (i.e., for deletions, you cannot read the "baseQ" of mutant alleles from the bam file directly). You can definitely pre-filter reads with low mapQ if you would want to.

Best wishes,

Y.

— Reply to this email directly, view it on GitHub https://github.com/parklab/MosaicForecast/issues/32#issuecomment-1692909161, or unsubscribe https://github.com/notifications/unsubscribe-auth/A43FDYPCIKGQBF2F4GAVCTDXXBI27ANCNFSM6AAAAAAZZTTCDA . You are receiving this because you were mentioned.Message ID: @.***>

jekim2022 avatar Aug 25 '23 13:08 jekim2022