CRISPResso2 icon indicating copy to clipboard operation
CRISPResso2 copied to clipboard

Question about the CRISPRessoWGS read filtering step

Open alechaka opened this issue 3 months ago • 2 comments

Hello! I have a question about the CRISPRessoWGS read filtering step

The input BAM file contains ~9 million reads in the analyzed region, but the output files include statistics for only ~90 thousand reads. I suspect that ~99% of the reads are being filtered out during the Fastp step. I tried to skip this filtering by setting the options --fastp_command "" and --fastp_options_string "", but this didn’t help

Could you please advise how I can skip the read filtering step?

alechaka avatar Nov 18 '25 10:11 alechaka

Hi @alechaka,

Thanks for using CRISPResso and sorry for the delay in responding! I just have a few questions about your setup:

  • Are you able to provide the command you used to run CRISPRessoWGS and your region file?
  • Are you using single end or paired end reads?
  • Have you tried decreasing the --min_reads_to_use_region parameter? The default is 10, but depending on how many regions you have, many reads may be filtered out because of this.

Thanks, Cole

Colelyman avatar Dec 02 '25 16:12 Colelyman

Hi @alechaka,

In CRISPRessoWGS, reads are only counted if they fully span the specified region. Any read that starts or ends inside the region (rather than extending beyond both boundaries) will be excluded from quantification. This often explains large drops in read counts.

In addition to Cole’s suggestion about fastp, you can try shrinking your quantification window (e.g., to ~10 bp) to ensure more reads fully overlap the region. This usually increases the number of reads that pass the filtering criteria.

kclem avatar Dec 04 '25 20:12 kclem