CRISPRessoWGS unsupported operand error
Describe the bug When I attempt to run CRISPressoWGS I get the following error "TypeError: unsupported operand type(s) for -: 'str' and 'int'" I have double checked the region file formatting but I am unable to resolve the error which I think is occurring when the file is read in and/or processed by get_region_from_fa
Expected behavior I am trying to run CRISPressoWGS to analyze base editing outcomes across a 1kb amplicon that has been tagmented then sequenced. I am attempting to use a strategy previously suggested (https://github.com/lucapinello/CRISPResso/issues/2). I aligned my sequencing file using STAR to generate a sorted BAM file and created a region file with just one 50bp region to test the approach.
To reproduce CRISPRessoWGS -b CBE_cDNA_CRISPRESSO_test.bam -f Region_file.txt -r GRCh38_chr12.fa --base_editor_output --name CRISPR_WGS_Test --debug
Debug output
INFO @ Thu, 20 Feb 2025 15:50:30 (0.0% done): Creating Folder CRISPRessoWGS_on_CRISPR_WGS_Test
WARNING @ Thu, 20 Feb 2025 15:50:30 (0.0% done): Folder CRISPRessoWGS_on_CRISPR_WGS_Test already exists.
INFO @ Thu, 20 Feb 2025 15:50:30 (0.0% done): Checking dependencies...
INFO @ Thu, 20 Feb 2025 15:50:30 (0.0% done):
All the required dependencies are present!
INFO @ Thu, 20 Feb 2025 15:50:30 (0.0% done): Index file for input .bam file exists, skipping generation.
INFO @ Thu, 20 Feb 2025 15:50:31 (0.0% done): The index for the reference fasta file is already present! Skipping generation.
INFO @ Thu, 20 Feb 2025 15:50:31 (0.0% done): Retrieving reference sequences for amplicons and checking for sgRNAs
Traceback (most recent call last):
File "/Users/sgarudadri/miniconda3/envs/crispresso2_env/lib/python3.12/site-packages/CRISPResso2/CRISPRessoWGSCORE.py", line 488, in main
df_regions['sequence']=df_regions.apply(lambda row: get_region_from_fa(row.chr_id, row.bpstart, row.bpend, uncompressed_reference), axis=1)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/sgarudadri/miniconda3/envs/crispresso2_env/lib/python3.12/site-packages/pandas/core/frame.py", line 10374, in apply
return op.apply().finalize(self, method="apply")
^^^^^^^^^^
File "/Users/sgarudadri/miniconda3/envs/crispresso2_env/lib/python3.12/site-packages/pandas/core/apply.py", line 916, in apply
return self.apply_standard()
^^^^^^^^^^^^^^^^^^^^^
File "/Users/sgarudadri/miniconda3/envs/crispresso2_env/lib/python3.12/site-packages/pandas/core/apply.py", line 1063, in apply_standard
results, res_index = self.apply_series_generator()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/sgarudadri/miniconda3/envs/crispresso2_env/lib/python3.12/site-packages/pandas/core/apply.py", line 1081, in apply_series_generator
results[i] = self.func(v, *self.args, **self.kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/sgarudadri/miniconda3/envs/crispresso2_env/lib/python3.12/site-packages/CRISPResso2/CRISPRessoWGSCORE.py", line 488, in
CRITICAL @ Thu, 20 Feb 2025 15:50:31 (0.0% done):
ERROR: unsupported operand type(s) for -: 'str' and 'int'
I am getting a similar error when running CRISPRessoBatch on Illumina single end reads in HDR mode. Following processing of all samples, the output I obtain is:
`INFO @ Sat, 22 Feb 2025 18:36:20 (90.0% done): Completed 104/104 runs
INFO @ Sat, 22 Feb 2025 18:36:20 (90.0% done): Finished all batches
INFO @ Sat, 22 Feb 2025 18:36:21 (90.0% done): Reporting summary for amplicon: "HDR"
INFO @ Sat, 22 Feb 2025 18:36:21 (90.0% done): All guides are equal. Performing comparison of batches for amplicon 'HDR'
CRITICAL @ Sat, 22 Feb 2025 18:36:21 (90.0% done):
ERROR: unsupported operand type(s) for -: 'list' and 'int' `
I updated CRISPResso in Docker (docker pull pinellolab/crispresso2) today, and had not seen this error prior to this update.
What version of CRISPResso is the version you downloaded?
For your WGS problem, check that all of your bpstart and bpend values in your region file are numbers - it looks like there may be a line without numbers in this column. You can also try deleting lines from your region file until you find the offending line.
For your Batch problem, can you run with --debug to see where this problem is coming from?
Thanks for your responses. Here is the version info:
Name Version Build Channel crispresso2 2.3.2 py312hfd810cf_0 bioconda
From the beginning I was suspicious about the region file. I have actually been using only one region to test the command, which I have attached. As far as I can tell it is formatted correctly.
Hi @sgarudadri,
Sorry to hear that you are running into this bug, could you also provide the version of pandas that you have installed?
Thanks, Cole
Hi Cole,
Here is the version info: pandas 2.2.3 py312h98e817e_1 conda-forge
Thanks for your help,
Suresh
Hi @sgarudadri , can you do me a favor and try removing the headers from your region.txt file and rerunning CRISPRessoWGS? I suspect that our code is attempting to parse those headers as values and is erroring because of that.
If that works, please let me know and we'll implement a check for the next release!
Thanks! After removing the headers CRISPRessoWGS indeed ran without errors.
A new problem however is that only 119 reads were extracted from the BAM file. I tried to look through the CRISPRessoWGS script but I can't figure out how to interrogate this further. My reference fasta appears to be formatted correctly and it does appear like the correct sequence was extracted based on REPORT_READS_ALIGNED_TO_SELECTED_REGIONS_WGS.txt
REPORT_READS_ALIGNED_TO_SELECTED_REGIONS_WGS.txt
The BAM file appears to be formatted correctly:
LH00416:231:22TKJHLT3:4:2161:24291:16205 147 12 57095150 255 151M = 57095021 -280 GACCTGCTCCTTCTCCCCTCTCCTTCCCCGTTTTTGTGCTTCTGGTTTGTTTCTTTAATTAATTTAACAAGTGCTGCAGTTTGCCCTCCCATTCCCATCTATCCCCCAAGTCCTTTGCAATTTCTTCCCTGCCCTACATAGGGGCGGTGGG IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII NH:i:1 HI:i:1 AS:i:290 nM:i:5 MD:Z:151 LH00416:231:22TKJHLT3:4:1106:45141:20163 99 12 57095151 255 151M = 57095276 276 ACCTGCTCCTTCTCCCCTCTCCTTCCCCGTTTTTGTGCTTCTGGTTTGTTTCTTTAATTAATTTAACAAGTGCTGCAGTTTGCCCTCCCATTCCCATCTATCCCCCAAGTCCTTTGCAATTTCTTCCCTGCCCTACATAGGGGCGGTGGGT 9I9-I9I--9IIIIIIIII9IIIII999999-99I9IIIIIIIIIIIIIIIIIIIIIIII9III9IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII NH:i:1 HI:i:1 AS:i:300 nM:i:0 MD:Z:151
and when I run samtools flagstat, I get the following output:
23667561 + 0 in total (QC-passed reads + QC-failed reads) 23667561 + 0 primary 0 + 0 secondary 0 + 0 supplementary 0 + 0 duplicates 0 + 0 primary duplicates 23667561 + 0 mapped (100.00% : N/A) 23667561 + 0 primary mapped (100.00% : N/A) 23667561 + 0 paired in sequencing 11833796 + 0 read1 11833765 + 0 read2 23667371 + 0 properly paired (100.00% : N/A) 23667371 + 0 with itself and mate mapped 190 + 0 singletons (0.00% : N/A) 0 + 0 with mate mapped to a different chr 0 + 0 with mate mapped to a different chr (mapQ>=5)
The original alignment was done with STAR on the whole genome. I then filtered the BAM file for my region of interest on chromosome 12, and I am using the fasta file for just chromosome 12 as the reference index for CRISPRessoWGS
Additionally (unrelated) I wanted to bring this warning that was output when I ran CRISPRessoWGS to your attention in case you were unaware "FutureWarning: 'DataFrame.swapaxes' is deprecated and will be removed in a future version. Please use 'DataFrame.transpose' instead. return bound(*args, **kwds)"
CRISPRessoWGS only considers reads that completely span your region of interest. Try reducing the length of the region of interest - especially of you are using short reads.
Yes this must have been the issue, I ended up selecting a different region and the alignment rate was as expected. Does CRISPRessoWGS output an Allele Frequency Table plot? Related to this, I notice on the allele frequency table that there are alleles listed as unmodified even though there is a modification. Is this because the default quantification window does not extend across the specified region?
Yes, CRISPRessoWGS will produce CRISPResso output for each region analyzed. You should be able to see a .html file in the CRISPRessoWGS output file that has links to the individual CRISPRessoWGS runs.
And yes, only edits that overlap the quantification window are considered. Note that by default any modifications that are found within 1bp of the predicted cut site are included in the analysis. If you'd like to change these parameters, you can use --quantification_window_size (default 1bp on either side of the cut site) or --quantification_window_center (default -3bp from the end).
Thanks, yes to clarify I was able to navigate to the CRISPResso output for the region, but for some reason the Allele Frequency Plot is missing. The table is present and I am trying to figure out if it has to do with the command I used. I did not include an sgRNA, is CRISPResso expecting this to generate the Allele Frequency plot?
Hi @sgarudadri,
Yes, if you provide the sgRNA, then you should see the Allele Frequency plot show up!
Thanks, Cole
Hi, any chance you guys figured out what's the problem with the CrispressoBatch? I'm getting the same error mentioned above (ERROR: unsupported operand type(s) for -: 'list' and 'int' ) and with the --debug option the error is raised at line 639 in CRISPRessoBatchCORE.py. I'm using the latest version 2.3.2.
The error did not raise when reverting to version 2.1.1.
Thanks!