shorah icon indicating copy to clipboard operation
shorah copied to clipboard

dec.py: "local variable 'prop' referenced before assignment"

Open derrickhlin opened this issue 5 years ago • 4 comments

Hello Dr. Zagordi,

I apologize if this is an incredibly basic question - I just started working with command line programs and NGS data this summer (since June 2018) and was hoping you might be able to help me in my exploration of analysis methods.

I currently have a data set containing 51 bp length Illumina reads of the NS5B gene (HCV). My reference is the h77 strain, and the full length of the genome in question is 1776 bp in length. I am working with Mac OSX. I am currently struggling with an error message of

local variable 'prop' referenced before assignment

As a summary of what I have done so far:

  1. Installed ShoRAH via miniconda3 conda install -c biopython shorah

  2. Generated input files (fastq > sam > bam > _sorted.bam) #local alignment using bowtie2 bowtie2-build **reference**.fasta h77 bowtie2 --local -x h77 -U **rawdata**.fastq -S **samfile**.sam -- running these 2 commands yields this output: 3522968 reads; of these: 3522968 (100.00%) were unpaired; of these: 509358 (14.46%) aligned 0 times 3013537 (85.54%) aligned exactly 1 time 73 (0.00%) aligned >1 times 85.54% overall alignment rate -- [I have also tried running an end-to-end alignment] -- #conversion to binary samtools view -b -T **reference**.fasta -o **bamfile**.bam **samfile**.sam -- #sorting .bam file samtools sort **bamfile**.bam -o **bamfile_sorted**.bam -- #generation of an indexing file samtools index **bamfile_sorted**.bam **bamfile_sorted**.bam.bai

  3. Used the reference.fasta file as a genome and loaded bamfile_sorted.bam into IGV which yields this mapping of the reads: image

  4. Checked coverage with Qualimap: image image image

  5. Attempted to run dec.py dec.py -b **bamfile_sorted**.bam -f **reference**.fasta -w 48 -r AF009606:200-1400 -- I selected a window of 48 because I had read in a previous issue post that the window should be slightly smaller than the read length. -- I chose to run from 200-1400 to avoid any issues of coverage with the ends.

All of my attempts to run the local analysis yield something similar to the following in the terminal:

Traceback (most recent call last): File "/Users/Derrick/miniconda3/bin/dec.py", line 78, in args.region, args.max_coverage, args.alpha, args.keep_files, args.seed) File "/Users/Derrick/miniconda3/lib/python3.6/site-packages/shorah_dec.py", line 449, in main proposed[beg] = (get_prop(dbg_file), j) File "/Users/Derrick/miniconda3/lib/python3.6/site-packages/shorah_dec.py", line 261, in get_prop return prop UnboundLocalError: local variable 'prop' referenced before assignment

in the dec.log file, the following lines are at the end of the log:

INFO 2018-08-03 15:39:41,522 main 442 reading windows for start position 152 WARNING 2018-08-03 15:39:44,129 correct_reads 234 No reads in window 152? INFO 2018-08-03 15:39:44,129 main 446 this is window w-AF009606-152-199

[I have tried adjusting the window size, the positions, and the alpha value without success.]

Ultimately, I suppose my question is whether this is a user error or a systematic error (i.e. the data are not suitable for running this analysis). While I believe that the data have sufficient coverage (based on the Qualimap output), I am a little unclear on how to determine what is sufficient. I am also unsure if the data have sufficient diversity to run through dec.py - and admittedly I am unclear on how to determine this as well.

Again, I apologize if I am missing something basic (and for the formatting of this post) - I am trying my best to learn on my own, but I thought it would be easiest to just ask!

Thanks in advance for your help!

Best, Derrick

derrickhlin avatar Aug 03 '18 21:08 derrickhlin