RedDog icon indicating copy to clipboard operation
RedDog copied to clipboard

BiopythonParserWarning

Open deearahman opened this issue 3 years ago • 6 comments

Hi,

I'm having issues with running the pipeline. The first warning message is BiopythonParserWarning. Then next is "No wat to run job"... Upon checking the bam and vcf folders, there were no files generated.

Similarly in the temp folders, the individual folders (callRepSNPs, deriveRepAlleleMartix, deriveRepStats, getVCFStats, q30VarFilter) were generated but no files. [common@t7920 RedDog_v0.4.8]$ rubra RedDog --config k1locus_config_massive.py --style run > run.txt /usr/lib64/python2.7/site-packages/Bio/GenBank/Scanner.py:1147: BiopythonParserWarning: Premature end of file in sequence data BiopythonParserWarning) /usr/lib64/python2.7/site-packages/Bio/GenBank/init.py:1306: BiopythonParserWarning: Expected sequence length 5248520, found 1855828 (AP006725.1). BiopythonParserWarning) Traceback (most recent call last): File "/usr/bin/rubra", line 11, in load_entry_point('Rubra==0.1.5', 'console_scripts', 'rubra')() File "build/bdist.linux-x86_64/egg/rubra/rubra.py", line 66, in main File "build/bdist.linux-x86_64/egg/ruffus/task.py", line 2671, in pipeline_run File "build/bdist.linux-x86_64/egg/ruffus/task.py", line 2406, in fill_queue_with_job_parameters File "build/bdist.linux-x86_64/egg/ruffus/task.py", line 2345, in parameter_generator ruffus.ruffus_exceptions.RethrownJobError:

Exceptions generating parameters for

'def RedDog.checkBam(...):'

Original exception:

Exception #1
ruffus.ruffus_exceptions.MissingInputFileError(    
    
    
    No way to run job: Input file ['/data3/Analysis/K1_Locus_KP/SNP/RedDog_Output/temp/ERR025468/ERR025468.bam'] does not exist):
for RedDog.checkBam.

Traceback (most recent call last):
  File "build/bdist.linux-x86_64/egg/ruffus/task.py", line 2279, in parameter_generator
    check_input_files_exist (*param)
  File "build/bdist.linux-x86_64/egg/ruffus/file_name_parameters.py", line 191, in check_input_files_exist
    "Input file ['%s'] does not exist" % f)
MissingInputFileError:     
    
    
    No way to run job: Input file ['/data3/Analysis/K1_Locus_KP/SNP/RedDog_Output/temp/ERR025468/ERR025468.bam'] does not exist

The run.txt file

RedDog V1beta.11 - phylogeny run

Copyright (c) 2016 David Edwards, Bernie Pope, Kat Holt All rights reserved. (see README.txt for more details)

Mapping: Bowtie2 V2.2.9 Preset Option: --sensitive-local 1 replicon(s) in GenBank reference AP006725.1 1 replicon(s) to be reported 25 sequence pair(s) to be mapped

Output folder: /data3/Analysis/K1_Locus_KP/SNP/RedDog_Output/

Starting pipeline... 444 jobs to be executed in total 414 jobs left to execute

Any ideas?

Thanks

deearahman avatar Nov 27 '20 08:11 deearahman

Hi, It sounds like there might be a problem with your input reference sequence. BioPython is saying that the sequence is shorter than indicated in the record header. You can check if the reference is complete by opening the file in a text editor and scrolling down to the end. Kelly

kelwyres avatar Nov 30 '20 05:11 kelwyres

Hi Kelly,

Thanks for you reply.

I re-downloaded the genbank file and that solved the biopython issue. But I am still getting this warning

[common@t7920 RedDog_v0.4.8]$ rubra RedDog --config k1locus_config_massive.py --style run > run.txt Traceback (most recent call last): File "/usr/bin/rubra", line 11, in load_entry_point('Rubra==0.1.5', 'console_scripts', 'rubra')() File "build/bdist.linux-x86_64/egg/rubra/rubra.py", line 66, in main File "build/bdist.linux-x86_64/egg/ruffus/task.py", line 2671, in pipeline_run File "build/bdist.linux-x86_64/egg/ruffus/task.py", line 2406, in fill_queue_with_job_parameters File "build/bdist.linux-x86_64/egg/ruffus/task.py", line 2345, in parameter_generator ruffus.ruffus_exceptions.RethrownJobError:

Exceptions generating parameters for

'def RedDog.checkBam(...):'

Original exception:

Exception #1
ruffus.ruffus_exceptions.MissingInputFileError(    
    
    
    No way to run job: Input file ['/data3/Analysis/K1_Locus_KP/SNP/RedDog_Output/temp/ERR025673/ERR025673.bam'] does not exist):
for RedDog.checkBam.

Traceback (most recent call last):
  File "build/bdist.linux-x86_64/egg/ruffus/task.py", line 2279, in parameter_generator
    check_input_files_exist (*param)
  File "build/bdist.linux-x86_64/egg/ruffus/file_name_parameters.py", line 191, in check_input_files_exist
    "Input file ['%s'] does not exist" % f)
MissingInputFileError:     
    
    
    No way to run job: Input file ['/data3/Analysis/K1_Locus_KP/SNP/RedDog_Output/temp/ERR025673/ERR025673.bam'] does not exist

Not exactly sure what went wrong.

Dyana

deearahman avatar Dec 10 '20 05:12 deearahman

Hi Dyana, Looks like pipeline is trying to check the bam mapping outout but can't find the file- so that indicates something went wrong with the mapping step. What sort of system are you running on? Depending how you are running the pipeline -do you have a directoy called 'log' inside your main RedDog directory and if so, are there any files in it? We can look inside the files to get a clue. Otherwise, first step would be to make sure that your selected mapping program (bwa or bowtie2) is installed and available in your path. Kelly

kelwyres avatar Dec 11 '20 09:12 kelwyres

Sorry for the late reply. here's a log file, there are more log files. I have tried bowtie2 and bwa it works fine.

pipeline.log

I re-installed again the RedDog_v1b11 but I keep getting this error.

[common@localhost RedDog_v0.4.8]$ rubra RedDog --config k1locus_config_massive.py --style run Traceback (most recent call last): File "/usr/bin/rubra", line 9, in load_entry_point('Rubra==0.1.5', 'console_scripts', 'rubra')() File "build/bdist.linux-x86_64/egg/rubra/rubra.py", line 35, in main File "RedDog.py", line 55, in from pipe_utils import (isGenbank, isFasta, chromInfoFasta, chromInfoGenbank, getValue, File "pipe_utils.py", line 13, in from Bio import SeqIO ImportError: No module named Bio

I've checked that the Bio module imports fine. Need help!

Updated: I have solve the ImportError: No module named Bio issue.

deearahman avatar Jan 07 '21 01:01 deearahman

Hi Kelly,

Just to let you know I have rectify the problem and it was due to different version of samtools that didn't allow the bam files to be generated. It is all good now. However, I noted that only files with naming such as ERR123456.fastq.gz are analysed. File with naming MDB104_S2_L001_R1_001.fastq.gz do not get processed. Is there any way around this? If not, I will have to rename it..

Thanks, Dyana

deearahman avatar Jan 09 '21 00:01 deearahman

Hi, Great that you've managed to fix the SAMtools problem. Unfortunately, I don't think there is any way around the file name convention, so you'll need to rename them, or perhaps try simlinks? Kelly

kelwyres avatar Jan 11 '21 04:01 kelwyres