Incorrect reference sequences for alignment
Hello,
I have been using the webserver-based CRISPResso2 tool to analyze targeting at each gRNA site (3 in total) in ~450 bp amplicon. For the my latest results, I observe that the reference sequence for some alignments is incorrect, leading to the false labelling of the WT read as targeted. In the output file attached here, when opened with Microsoft Excel, I see that the reference sequences in cells B2, B7, and B26 is different from the others. This leads to the corresponding reads being labelled as edited when they may be WT. Here are the parameters used for the run
allele_plot_pcts_only_for_assigned_reference: False aln_seed_count: 5 aln_seed_len: 10 aln_seed_min: 2 amplicon_min_alignment_score: amplicon_name: Reference amplicon_seq: TCCAGTGTGAGTTCGAGGGCTGTGACCGGCGCTTCGCCAACAGCAGCGACAGGAAGAAGCACATGCATGTCCACACCTCAGATAAGCCCTATCTCTGCAAGATGTGTGACAAGTCCTACACGCATCCCAGCTCGTTGCGGAAGCACATGAAGGTACCACTGCAGTAGCCGGGAGGGCTAGGCCGACCTGGAGCATCAGCTAGCTCCCAGCGGGCCTGGGAGGGTCCCCAGAGGTCGAGGGACGCTCTTGGGGTGCCCTCGGCTCGGGGACCCGGCCTCACAGCAGCTGCACTCACACCCAGTCCCCTCTGGTCCCCACTCCCGGCTTTTGTCTTCCAGGTCCATGAGTCCTCCCCTCAGGGCTCCGAGTCCTCCCCGGCTGCCAGCTCTGGCTACGAGTCGTCCACACCCCCGGGGTTGGTGTCCCCCAGCGCAGAGCCACAAAGC annotate_wildtype_allele: assign_ambiguous_alignments_to_first_reference: False auto: False bam_chr_loc: bam_input: bam_output: False base_editor_output: False bowtie2_index: coding_seq: conversion_nuc_from: C conversion_nuc_to: T crispresso1_mode: False debug: False default_min_aln_score: 60 discard_guide_positions_overhanging_amplicon_edge: False discard_indel_reads: False dsODN: dump: False exclude_bp_from_left: 15 exclude_bp_from_right: 15 expand_allele_plots_by_quantification: False expand_ambiguous_alignments: False expected_hdr_amplicon_seq: fastq_output: False fastq_r1: CRISPResso_Input_Reads_3ed9b37b-3fc5-4d0a-b10a-3a91652aadf0.gz fastq_r2: CRISPResso_Input_Reads_292e7772-00ca-42dc-a832-40d5001f9142.gz file_prefix: flash_command: flash flexiguide_homology: 80 flexiguide_name: flexiguide_seq: None force_merge_pairs: False guide_name: guide_seq: GCTTCGCCAACAGCAGCGAC,ACACGCATCCCAGCTCGTTG,ACGAGTCGTCCACACCCCCG ignore_deletions: False ignore_insertions: False ignore_substitutions: True keep_intermediate: False max_paired_end_reads_overlap: 100 max_rows_alleles_around_cut_to_plot: 50 min_average_read_quality: 0 min_bp_quality_or_N: 0 min_frequency_alleles_around_cut_to_plot: 0.2 min_paired_end_reads_overlap: 10 min_single_bp_quality: 0 n_processes: 1 name: doog3o3v_sample_name_Zic2_whole_amplicon needleman_wunsch_aln_matrix_loc: EDNAFULL needleman_wunsch_gap_extend: -2 needleman_wunsch_gap_incentive: 1 needleman_wunsch_gap_open: -20 no_rerun: False output_folder: CRISPRessoRundoog3o3v_sample_name_Zic2_whole_amplicon place_report_in_output_folder: True plot_histogram_outliers: False plot_window_size: 20 prime_editing_nicking_guide_seq: prime_editing_override_prime_edited_ref_seq: prime_editing_override_sequence_checks: False prime_editing_pegRNA_extension_quantification_window_size: 5 prime_editing_pegRNA_extension_seq: prime_editing_pegRNA_scaffold_min_match_length: 1 prime_editing_pegRNA_scaffold_seq: prime_editing_pegRNA_spacer_seq: quantification_window_center: -3 quantification_window_coordinates: None quantification_window_size: 6 save_also_png: False split_interleaved_input: False stringent_flash_merging: False suppress_plots: False suppress_report: False trim_sequences: False trimmomatic_command: trimmomatic trimmomatic_options_string: use_legacy_insertion_quantification: False verbosity: 3 write_cleaned_report: True write_detailed_allele_table: False zip_output: False
I am also attaching the running log file, in case it helps.
Alleles_frequency_table_around_sgRNA_ACACGCATCCCAGCTCGTTG.txt
Hi @Jay-Mehul-Panji,
Thanks for using CRISPResso and thank you for reaching out! This is an excellent question, and there are a few ways that you can address this.
- Not sure if you have tried this, but you can supply multiple amplicon reference sequences (i.e. "Control" and "Treated") so that CRISPResso will assign the reads to the sequence that it best aligns to. You do this by providing sequences in the "Amplicon" box and separating them with a comma. Furthermore, if you click on "Optional Parameters" you can specify names for each amplicon that will be shown later in the report. This option also works great if you are editing a heterogeneous locus and want to specify multiple reference alleles. This would be my recommended approach.
- Something else to consider is the placement of the quantification windows. By default, the quantification window will be centered at -3 of the end of your sgRNA (where Cas9 cuts) and will take into account 1 bp on the left and 1 bp on the right. Any modifications that are in your quantification window will be considered "modified" (i.e. genome edits) and if a read has only modifications outside of this window will be considered as "unmodified." You mentioned that you have 3 sgRNAs, so you would want to ensure that the quantification windows only include areas where you want to measure editing.
Please let us know if this makes sense and if you have any additional questions!
Thanks, Cole
Hi @Colelyman ,
Thanks for the suggestions.
- I tried supplying multiple amplicon reference sequences but I get a message saying that there is an error in one of the parameter settings.
- My sgRNAs are spaced (atleast 30-40 bp between each sgRNA) throughout the amplicon and there is no overlap among any of the target sites. Regarding the quantification window, I take a quantification window size 6 to account for the three bases proximal to the PAM site, where the Cas9 is most likely to induce a break (as per my understanding of how CRISPResso and Cas9 would work).
Thanks, Jay Mehul Panji
Hi Jay,
Regarding the error message, do you have any additional information that you could share? If running on the web version, you can share the URL, or any other error messages.
Thanks, Cole
Hello @Colelyman ,
Here's the error message I get.