fgbio icon indicating copy to clipboard operation
fgbio copied to clipboard

FilterConsensusReads --min-reads 1 0 0 vs --min-reads 0 0 0; Why do they differ?

Open dennis-serum opened this issue 2 years ago • 4 comments

I ran this function with --min-reads 1 0 0 and --min-reads 0 0 0 and get slightly more reads in the second setting. I was just curious as to how these could give a different result?

dennis-serum avatar Apr 26 '22 16:04 dennis-serum

@dstephensSD that's a bit surprising, since a consensus read should have depth at least 1 always. Can you show a read that is kept in the second setting.

nh13 avatar Apr 26 '22 16:04 nh13

You might try setting --max-no-call-fraction to 1? I suspect there is some weird interaction going on where even having the min reads be 1 might be occasionally masking more bases in consensus reads, then causing the reads to fail the additional filter of how many Ns are in the read? As @nh13 says, a BAM file with 1-2 example reads that pass with 0 0 0 and get filtered out with 1 0 0 would be super helpful in diagnosing.

tfenne avatar Apr 27 '22 11:04 tfenne

Sorry for the late reply. I'm unable to share any reads due to privacy protections. I ran FilterConsensusReads with a variety of settings and here are some more results:

image

If I can get some additional samples that aren't protected I will share some reads with you to help troubleshoot.

Thanks

dennis-serum avatar May 12 '22 19:05 dennis-serum

@dstephensSD you could probably just replace all the bases with As, change the read names to READ<i>, and then there's no information left that's a privacy concern. You could also set all mappings to chr1 position 1

nh13 avatar Jul 20 '22 04:07 nh13