Pisces icon indicating copy to clipboard operation
Pisces copied to clipboard

Option to allow non-anchored indels?

Open hydriniumh2 opened this issue 6 years ago • 8 comments

Hi Tamsen, I've been testing validation datasets using Pisces and the results are very promising, however I have run into an issue where Pisces will complain about indels which have no anchor, stemming from this check here . I believe these reads are valid as they are supported by other reads which do have anchors and mirror the non-anchored reads. Would there be any way short of removing these reads or manually rewriting them to have an anchor for Pisces to accept them as valid? Thank you, Deric

hydriniumh2 avatar Aug 06 '18 20:08 hydriniumh2

Hi there,

Yes, I talked to the team and I think we can do this for you with our next release, but we need to test it internally first. We had a few questions: How often does this happen in your data? Is it frequent? We wanted to know if logging a warning rather than throwing would be better? Downside to a warning: If this happens a lot in your data, your log will be huge. Is it possible to send us some sample data? How did you align these unanchored indel reads in the first place?

thanks! Tamsen

tamsen avatar Aug 13 '18 13:08 tamsen

Hi Tamsen, Thats great, thanks for the fast response! With regard to your questions:

  • So far its only happened to one sample but I haven't tried running Pisces on that many samples so I can't really speak to the frequency.
  • Logging a warning would be preferable to completely halting the program.
  • The nonanchored indels with cigar 151I only occurred 6 times in the entire alignment, so I don't think spamming the log will be an issue.
  • The sample data is publicly available here: ftp://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/use_cases/mixtures/UMCUTRECHT_NA12878_NA24385_mixture_10052016/
  • The samples were aligned using bwa-mem. I'm not entirely certain how it decided the alignments of unanchored reads but it might have something to do with the reads being paired end.

Thank you, Deric

hydriniumh2 avatar Aug 13 '18 14:08 hydriniumh2

Hi Deric,

I see fastq on the public data. For completeness, can you pls send me the exact bwa-mem cmd line used to make the bams you have? Testing the issue now

Thank you

tamsen avatar Oct 10 '18 20:10 tamsen

Hi Tamsen, No problem, here is the command I used. I should mention that I used the sentieon implementation of bwa mem so the parameters may be slightly differently named. This was aligned to reference hs37d5.

${SENTIEON_BIN}/bwa mem \
                -t 30 \
                -M \
                -R "@RG\\tID:NA12878_NA24385\\tPL:ILLUMINA\\tSM:NA12878_NA24385" \
                genome.fa \
                NA12878_NA24385_R1.fastq \
                NA12878_NA24385_R2.fastq |  \
            \${SENTIEON_BIN}/sentieon util sort \
                -o NA12878_NA24385.bam \
                -t 30 \
                --sam2bam \
                -i -

hydriniumh2 avatar Oct 11 '18 13:10 hydriniumh2

thank you!

tamsen avatar Oct 11 '18 16:10 tamsen

Update, this is checked in now to our internal dev branch. In our next release (5.2.10) we will allow unanchored indels. We will just warn and continue processing.

tamsen avatar Oct 29 '18 21:10 tamsen

Thanks, I really appreciate you taking the time to look into this and add this change!

hydriniumh2 avatar Oct 30 '18 13:10 hydriniumh2

hi! this issue should now be resolved in 5.2.10. Let us know if you still have issues.

tamsen avatar Feb 22 '19 16:02 tamsen