CRISPResso2 icon indicating copy to clipboard operation
CRISPResso2 copied to clipboard

CRISPRessoWGS unsupported operand error

Open sgarudadri opened this issue 1 year ago • 13 comments

Describe the bug When I attempt to run CRISPressoWGS I get the following error "TypeError: unsupported operand type(s) for -: 'str' and 'int'" I have double checked the region file formatting but I am unable to resolve the error which I think is occurring when the file is read in and/or processed by get_region_from_fa

Expected behavior I am trying to run CRISPressoWGS to analyze base editing outcomes across a 1kb amplicon that has been tagmented then sequenced. I am attempting to use a strategy previously suggested (https://github.com/lucapinello/CRISPResso/issues/2). I aligned my sequencing file using STAR to generate a sorted BAM file and created a region file with just one 50bp region to test the approach.

To reproduce CRISPRessoWGS -b CBE_cDNA_CRISPRESSO_test.bam -f Region_file.txt -r GRCh38_chr12.fa --base_editor_output --name CRISPR_WGS_Test --debug

Debug output

INFO @ Thu, 20 Feb 2025 15:50:30 (0.0% done): Creating Folder CRISPRessoWGS_on_CRISPR_WGS_Test

WARNING @ Thu, 20 Feb 2025 15:50:30 (0.0% done): Folder CRISPRessoWGS_on_CRISPR_WGS_Test already exists.

INFO @ Thu, 20 Feb 2025 15:50:30 (0.0% done): Checking dependencies...

INFO @ Thu, 20 Feb 2025 15:50:30 (0.0% done):

All the required dependencies are present!

INFO @ Thu, 20 Feb 2025 15:50:30 (0.0% done): Index file for input .bam file exists, skipping generation.

INFO @ Thu, 20 Feb 2025 15:50:31 (0.0% done): The index for the reference fasta file is already present! Skipping generation.

INFO @ Thu, 20 Feb 2025 15:50:31 (0.0% done): Retrieving reference sequences for amplicons and checking for sgRNAs

Traceback (most recent call last): File "/Users/sgarudadri/miniconda3/envs/crispresso2_env/lib/python3.12/site-packages/CRISPResso2/CRISPRessoWGSCORE.py", line 488, in main df_regions['sequence']=df_regions.apply(lambda row: get_region_from_fa(row.chr_id, row.bpstart, row.bpend, uncompressed_reference), axis=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/sgarudadri/miniconda3/envs/crispresso2_env/lib/python3.12/site-packages/pandas/core/frame.py", line 10374, in apply return op.apply().finalize(self, method="apply") ^^^^^^^^^^ File "/Users/sgarudadri/miniconda3/envs/crispresso2_env/lib/python3.12/site-packages/pandas/core/apply.py", line 916, in apply return self.apply_standard() ^^^^^^^^^^^^^^^^^^^^^ File "/Users/sgarudadri/miniconda3/envs/crispresso2_env/lib/python3.12/site-packages/pandas/core/apply.py", line 1063, in apply_standard results, res_index = self.apply_series_generator() ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/sgarudadri/miniconda3/envs/crispresso2_env/lib/python3.12/site-packages/pandas/core/apply.py", line 1081, in apply_series_generator results[i] = self.func(v, *self.args, **self.kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/sgarudadri/miniconda3/envs/crispresso2_env/lib/python3.12/site-packages/CRISPResso2/CRISPRessoWGSCORE.py", line 488, in df_regions['sequence']=df_regions.apply(lambda row: get_region_from_fa(row.chr_id, row.bpstart, row.bpend, uncompressed_reference), axis=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/sgarudadri/miniconda3/envs/crispresso2_env/lib/python3.12/site-packages/CRISPResso2/CRISPRessoWGSCORE.py", line 100, in get_region_from_fa region='%s:%d-%d' % (chr_id, bpstart, bpend -1) ~~~~~~^~ TypeError: unsupported operand type(s) for -: 'str' and 'int' CRITICAL @ Thu, 20 Feb 2025 15:50:31 (0.0% done): Traceback (most recent call last): File "/Users/sgarudadri/miniconda3/envs/crispresso2_env/lib/python3.12/site-packages/CRISPResso2/CRISPRessoWGSCORE.py", line 488, in main df_regions['sequence']=df_regions.apply(lambda row: get_region_from_fa(row.chr_id, row.bpstart, row.bpend, uncompressed_reference), axis=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/sgarudadri/miniconda3/envs/crispresso2_env/lib/python3.12/site-packages/pandas/core/frame.py", line 10374, in apply return op.apply().finalize(self, method="apply") ^^^^^^^^^^ File "/Users/sgarudadri/miniconda3/envs/crispresso2_env/lib/python3.12/site-packages/pandas/core/apply.py", line 916, in apply return self.apply_standard() ^^^^^^^^^^^^^^^^^^^^^ File "/Users/sgarudadri/miniconda3/envs/crispresso2_env/lib/python3.12/site-packages/pandas/core/apply.py", line 1063, in apply_standard results, res_index = self.apply_series_generator() ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/sgarudadri/miniconda3/envs/crispresso2_env/lib/python3.12/site-packages/pandas/core/apply.py", line 1081, in apply_series_generator results[i] = self.func(v, *self.args, **self.kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/sgarudadri/miniconda3/envs/crispresso2_env/lib/python3.12/site-packages/CRISPResso2/CRISPRessoWGSCORE.py", line 488, in df_regions['sequence']=df_regions.apply(lambda row: get_region_from_fa(row.chr_id, row.bpstart, row.bpend, uncompressed_reference), axis=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/sgarudadri/miniconda3/envs/crispresso2_env/lib/python3.12/site-packages/CRISPResso2/CRISPRessoWGSCORE.py", line 100, in get_region_from_fa region='%s:%d-%d' % (chr_id, bpstart, bpend -1) ~~~~~~^~ TypeError: unsupported operand type(s) for -: 'str' and 'int'

CRITICAL @ Thu, 20 Feb 2025 15:50:31 (0.0% done):

ERROR: unsupported operand type(s) for -: 'str' and 'int'

sgarudadri avatar Feb 20 '25 23:02 sgarudadri

I am getting a similar error when running CRISPRessoBatch on Illumina single end reads in HDR mode. Following processing of all samples, the output I obtain is:

`INFO @ Sat, 22 Feb 2025 18:36:20 (90.0% done): Completed 104/104 runs

INFO @ Sat, 22 Feb 2025 18:36:20 (90.0% done): Finished all batches

INFO @ Sat, 22 Feb 2025 18:36:21 (90.0% done): Reporting summary for amplicon: "HDR"

INFO @ Sat, 22 Feb 2025 18:36:21 (90.0% done): All guides are equal. Performing comparison of batches for amplicon 'HDR'

CRITICAL @ Sat, 22 Feb 2025 18:36:21 (90.0% done):

ERROR: unsupported operand type(s) for -: 'list' and 'int' `

I updated CRISPResso in Docker (docker pull pinellolab/crispresso2) today, and had not seen this error prior to this update.

What version of CRISPResso is the version you downloaded?

For your WGS problem, check that all of your bpstart and bpend values in your region file are numbers - it looks like there may be a line without numbers in this column. You can also try deleting lines from your region file until you find the offending line.

For your Batch problem, can you run with --debug to see where this problem is coming from?

kclem avatar Feb 24 '25 19:02 kclem

Thanks for your responses. Here is the version info:

Name Version Build Channel crispresso2 2.3.2 py312hfd810cf_0 bioconda

From the beginning I was suspicious about the region file. I have actually been using only one region to test the command, which I have attached. As far as I can tell it is formatted correctly.

Region_file.txt

sgarudadri avatar Feb 24 '25 20:02 sgarudadri

Hi @sgarudadri,

Sorry to hear that you are running into this bug, could you also provide the version of pandas that you have installed?

Thanks, Cole

Colelyman avatar Feb 25 '25 04:02 Colelyman

Hi Cole,

Here is the version info: pandas 2.2.3 py312h98e817e_1 conda-forge

Thanks for your help,

Suresh

sgarudadri avatar Feb 25 '25 18:02 sgarudadri

Hi @sgarudadri , can you do me a favor and try removing the headers from your region.txt file and rerunning CRISPRessoWGS? I suspect that our code is attempting to parse those headers as values and is erroring because of that.

If that works, please let me know and we'll implement a check for the next release!

trevormartinj7 avatar Mar 04 '25 21:03 trevormartinj7

Thanks! After removing the headers CRISPRessoWGS indeed ran without errors.

A new problem however is that only 119 reads were extracted from the BAM file. I tried to look through the CRISPRessoWGS script but I can't figure out how to interrogate this further. My reference fasta appears to be formatted correctly and it does appear like the correct sequence was extracted based on REPORT_READS_ALIGNED_TO_SELECTED_REGIONS_WGS.txt

REPORT_READS_ALIGNED_TO_SELECTED_REGIONS_WGS.txt

The BAM file appears to be formatted correctly:

LH00416:231:22TKJHLT3:4:2161:24291:16205 147 12 57095150 255 151M = 57095021 -280 GACCTGCTCCTTCTCCCCTCTCCTTCCCCGTTTTTGTGCTTCTGGTTTGTTTCTTTAATTAATTTAACAAGTGCTGCAGTTTGCCCTCCCATTCCCATCTATCCCCCAAGTCCTTTGCAATTTCTTCCCTGCCCTACATAGGGGCGGTGGG IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII NH:i:1 HI:i:1 AS:i:290 nM:i:5 MD:Z:151 LH00416:231:22TKJHLT3:4:1106:45141:20163 99 12 57095151 255 151M = 57095276 276 ACCTGCTCCTTCTCCCCTCTCCTTCCCCGTTTTTGTGCTTCTGGTTTGTTTCTTTAATTAATTTAACAAGTGCTGCAGTTTGCCCTCCCATTCCCATCTATCCCCCAAGTCCTTTGCAATTTCTTCCCTGCCCTACATAGGGGCGGTGGGT 9I9-I9I--9IIIIIIIII9IIIII999999-99I9IIIIIIIIIIIIIIIIIIIIIIII9III9IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII NH:i:1 HI:i:1 AS:i:300 nM:i:0 MD:Z:151

and when I run samtools flagstat, I get the following output:

23667561 + 0 in total (QC-passed reads + QC-failed reads) 23667561 + 0 primary 0 + 0 secondary 0 + 0 supplementary 0 + 0 duplicates 0 + 0 primary duplicates 23667561 + 0 mapped (100.00% : N/A) 23667561 + 0 primary mapped (100.00% : N/A) 23667561 + 0 paired in sequencing 11833796 + 0 read1 11833765 + 0 read2 23667371 + 0 properly paired (100.00% : N/A) 23667371 + 0 with itself and mate mapped 190 + 0 singletons (0.00% : N/A) 0 + 0 with mate mapped to a different chr 0 + 0 with mate mapped to a different chr (mapQ>=5)

The original alignment was done with STAR on the whole genome. I then filtered the BAM file for my region of interest on chromosome 12, and I am using the fasta file for just chromosome 12 as the reference index for CRISPRessoWGS

Additionally (unrelated) I wanted to bring this warning that was output when I ran CRISPRessoWGS to your attention in case you were unaware "FutureWarning: 'DataFrame.swapaxes' is deprecated and will be removed in a future version. Please use 'DataFrame.transpose' instead. return bound(*args, **kwds)"

sgarudadri avatar Mar 06 '25 18:03 sgarudadri

CRISPRessoWGS only considers reads that completely span your region of interest. Try reducing the length of the region of interest - especially of you are using short reads.

kclem avatar Mar 06 '25 20:03 kclem

Yes this must have been the issue, I ended up selecting a different region and the alignment rate was as expected. Does CRISPRessoWGS output an Allele Frequency Table plot? Related to this, I notice on the allele frequency table that there are alleles listed as unmodified even though there is a modification. Is this because the default quantification window does not extend across the specified region?

sgarudadri avatar Mar 06 '25 21:03 sgarudadri

Yes, CRISPRessoWGS will produce CRISPResso output for each region analyzed. You should be able to see a .html file in the CRISPRessoWGS output file that has links to the individual CRISPRessoWGS runs.

And yes, only edits that overlap the quantification window are considered. Note that by default any modifications that are found within 1bp of the predicted cut site are included in the analysis. If you'd like to change these parameters, you can use --quantification_window_size (default 1bp on either side of the cut site) or --quantification_window_center (default -3bp from the end).

kclem avatar Mar 06 '25 21:03 kclem

Thanks, yes to clarify I was able to navigate to the CRISPResso output for the region, but for some reason the Allele Frequency Plot is missing. The table is present and I am trying to figure out if it has to do with the command I used. I did not include an sgRNA, is CRISPResso expecting this to generate the Allele Frequency plot?

sgarudadri avatar Mar 06 '25 22:03 sgarudadri

Hi @sgarudadri,

Yes, if you provide the sgRNA, then you should see the Allele Frequency plot show up!

Thanks, Cole

Colelyman avatar Mar 06 '25 22:03 Colelyman

Hi, any chance you guys figured out what's the problem with the CrispressoBatch? I'm getting the same error mentioned above (ERROR: unsupported operand type(s) for -: 'list' and 'int' ) and with the --debug option the error is raised at line 639 in CRISPRessoBatchCORE.py. I'm using the latest version 2.3.2.

The error did not raise when reverting to version 2.1.1.

Thanks!

nbruciaferri avatar May 07 '25 03:05 nbruciaferri