Question about the CRISPRessoWGS read filtering step
Hello! I have a question about the CRISPRessoWGS read filtering step
The input BAM file contains ~9 million reads in the analyzed region, but the output files include statistics for only ~90 thousand reads. I suspect that ~99% of the reads are being filtered out during the Fastp step. I tried to skip this filtering by setting the options --fastp_command "" and --fastp_options_string "", but this didn’t help
Could you please advise how I can skip the read filtering step?
Hi @alechaka,
Thanks for using CRISPResso and sorry for the delay in responding! I just have a few questions about your setup:
- Are you able to provide the command you used to run CRISPRessoWGS and your region file?
- Are you using single end or paired end reads?
- Have you tried decreasing the --min_reads_to_use_region parameter? The default is 10, but depending on how many regions you have, many reads may be filtered out because of this.
Thanks, Cole
Hi @alechaka,
In CRISPRessoWGS, reads are only counted if they fully span the specified region. Any read that starts or ends inside the region (rather than extending beyond both boundaries) will be excluded from quantification. This often explains large drops in read counts.
In addition to Cole’s suggestion about fastp, you can try shrinking your quantification window (e.g., to ~10 bp) to ensure more reads fully overlap the region. This usually increases the number of reads that pass the filtering criteria.