CRISPResso2 icon indicating copy to clipboard operation
CRISPResso2 copied to clipboard

CRISPRessoWGS unexpected error

Open misik opened this issue 6 years ago • 1 comments

Hi, I am getting the following error with some of the trimmed fastq files that are produced by CRISPRessoWGS. I ran fastqc on those failed fastq files and found that some of the sequences are empty and CRISPResso fails because of these empty sequences. Can you please update the code to omit those empty fastq reads after trimming?

image

[Execution log]: Aligning sequences... Processing reads; N_TOT_READS: 0 N_COMPUTED_ALN: 0 N_CACHED_ALN: 0 N_COMPUTED_NOTALN: 0 N_CACHED_NOTALN: 0 Processing reads; N_TOT_READS: 10000 N_COMPUTED_ALN: 298 N_CACHED_ALN: 9702 N_COMPUTED_NOTALN: 0 N_CACHED_NOTALN: 0 Processing reads; N_TOT_READS: 20000 N_COMPUTED_ALN: 422 N_CACHED_ALN: 19578 N_COMPUTED_NOTALN: 0 N_CACHED_NOTALN: 0 Processing reads; N_TOT_READS: 30000 N_COMPUTED_ALN: 523 N_CACHED_ALN: 29477 N_COMPUTED_NOTALN: 0 N_CACHED_NOTALN: 0 Processing reads; N_TOT_READS: 40000 N_COMPUTED_ALN: 616 N_CACHED_ALN: 39384 N_COMPUTED_NOTALN: 0 N_CACHED_NOTALN: 0 Processing reads; N_TOT_READS: 50000 N_COMPUTED_ALN: 708 N_CACHED_ALN: 49292 N_COMPUTED_NOTALN: 0 N_CACHED_NOTALN: 0 Processing reads; N_TOT_READS: 60000 N_COMPUTED_ALN: 886 N_CACHED_ALN: 59114 N_COMPUTED_NOTALN: 0 N_CACHED_NOTALN: 0 Unexpected error, please check your input.

ERROR: '*'

The part of the CRISPRessoWGSCORE.py that will be updated is below (to be added "if len(seq)>0"):

def write_trimmed_fastq(in_bam_filename,bpstart,bpend,out_fastq_filename): p = sb.Popen( 'samtools view %s | cut -f1,4,6,10,11' % in_bam_filename, stdout = sb.PIPE, stderr = sb.STDOUT, shell=True )

output=p.communicate()[0]
n_reads=0

with gzip.open(out_fastq_filename,'w+') as outfile:

    for line in output.split('\n'):
        if line:
            (name,pos,cigar,seq,qual)=line.split()
            #print name,pos,cigar,seq
            pos=int(pos)
            positions=get_reference_positions(pos,cigar)

            if bpstart in positions and bpend in positions:# and positions[0]<=bpstart and  positions[-1]>=bpend:

                st=positions.index(bpstart)
                en=find_last(positions,bpend)
                #print st,en,seq,seq[st:en]
                n_reads+=1
                #print '>%s\n%s\n+\n%s\n' %(name,seq[st:en],qual[st:en])
                outfile.write('@%s_%d\n%s\n+\n%s\n' %(name,n_reads,seq[st:en],qual[st:en]))
return n_reads

misik avatar Sep 05 '19 15:09 misik

This is very strange. Do you know which entries in your bam are causing this error? Does your bam contain reads of length 0?

kclem avatar Sep 06 '19 18:09 kclem

We're closing this issue because it hasn't been updated recently. If this issue still exists, please reopen this issue and we'll look into it.

kclem avatar Apr 13 '23 20:04 kclem