MicroExonator
MicroExonator copied to clipboard
ME_filter1.py is not working
Hello,
I try to run MicroExonator on paired-end RNA-seq data. I can't resolve the issue with ME_filter1.py.
I did everything as descriped in the manual, my config.yaml file:
Genome_fasta : /data/resources/mouse/genome/GRCm38.p6.genome.fa Gene_anontation_bed12 : /data/resources/mouse/genome/mm10_UCSC_knownGene.bed GT_AG_U2_5 : /data/MicroExonator/PWM/Mouse/mm10_GT_AG_U2_5.good.matrix GT_AG_U2_3 : /data/MicroExonator/PWM/Mouse/mm10_GT_AG_U2_3.good.matrix conservation_bigwig : /data/resources/mouse/PhastCons/mm10.60way.phastCons.bw working_directory : /data/MicroExonator ME_len : 30 Optimize_hard_drive : T min_number_files_detected : 3 paired_samples : /data/MicroExonator/paired_samples.txt
Then I started with
snakemake -s MicroExonator.skm --use-conda -k -j 32
Which led to more or less the same error for every single input file:
Error in rule Round1_filter: jobid: 159 RuleException: CalledProcessError in line 56 of /data/MicroExonator/rules/Round1_post_processing.skm: Command 'source activate /data/MicroExonator/.snakemake/conda/f2d123d5; set -euo pipefail; python3 src/ME_filter1.py /data/resources/mouse/genome/GRCm38.p6.genome.fa Round1/Scnn_3_R2.sam.row_ME Round1/Scnn_3_R2.sam.row_ME.Genome.Aligned.out.sam data/GT_AG_U2_5.pwm data/GT_AG_U2_3.pwm /data/resources/mouse/PhastCons/mm10.60way.phastCons.bw 30 > Round1/Scnn_3_R2.sam.row_ME.filter1 ' returned non-zero exit status 1. File "/data/MicroExonator/rules/Round1_post_processing.skm", line 56, in __rule_Round1_filter File "/home/stephan/anaconda3/envs/snakemake_env/lib/python3.6/concurrent/futures/thread.py", line 56, in run output: Round1/Scnn_1_R2.sam.row_ME.filter1 Removing output files of failed job Round1_filter since they might be corrupted: Round1/Scnn_3_R2.sam.row_ME.filter1 conda-env: /data/MicroExonator/.snakemake/conda/f2d123d5 Job failed, going on with independent jobs.
I figured out, that I had not installed all dependencies since they weren't given in the installation. I installed them manually following the import statetments in ME_filter1.py. Since Biopython is not longer supported for Python2, I switched everything to Python3 and removed the not working print statements from the script.
Now, I can get a few lines of output, by manually calling ME_filter1.py, but at some point the script will crash with:
python3 src/ME_filter1.py /data/resources/mouse/genome/GRCm38.p6.genome.fa Round1/SST_1_R1.sam.row_ME Round1/SST_1_R1.sam.row_ME.Genome.Aligned.out.sam data/GT_AG_U2_5.pwm data/GT_AG_U2_3.pwm /data/resources/mouse/PhastCons/mm10.60way.phastCons.bw 30
Working output:
D00535:22:CBLNNANXX:2:2202:18520:19601 CGCCAGCCAGAGCAGGCCCGCCGGCCCCTCAGTGTTGCCACAGACAACATGATGCTGGAGTTTTACAAGAAGGATGGCCTTAGGAAAATCCAAAGCATGGG GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGBBB@? chr11:65012098-65021940|uc007jku.1|100_100|76M18I7M 101 chr11|-|65012906|6S18M9016N77M 95 True 18 CCTTAGGAAAATCCAAAG 1 78.9604235156872 1.0 chr11_-_65012906_65012924 78.9604235156872 1.0 [...]
and then the error:
Traceback (most recent call last): File "src/ME_filter1.py", line 372, in <module> main(sys.argv[2], sys.argv[3], sys.argv[4], sys.argv[5], sys.argv[6], int(sys.argv[7])) File "src/ME_filter1.py", line 364, in main print(read, seq, qual, tag_alingment, t_score, genome_alingment, g_score, same_ME, len(DR_corrected_micro_exon_seq_found), DR_corrected_micro_exon_seq_found, len(micro_exons), max(U2_scores), max(TOTAL_mean_conservation), micro_exons_coords, ",".join(map(str, U2_scores)), ",".join(map(str, TOTAL_mean_conservation))) TypeError: '>' not supported between instances of 'NoneType' and 'float'
How can I fix this? Can you please provide a full list of needed dependencies?
Hello,
Dependences are solved automatically by snakemake when you run it with --use-conda
, this is why I am not providing the list of dependencies in the documentation. With this flag, snakemake creates the conda environments dynamically from the YAML files that are located at MicroExonator/envs/
. In the particular case of ME_filter1.py
uses pybedtools.yaml
, take a look at this file and you will know which are the dependencies.
Looking at:
'source activate /data/MicroExonator/.snakemake/conda/f2d123d5; set -euo pipefail; python3 src/ME_filter1.py /data/resources/mouse/genome/GRCm38.p6.genome.fa Round1/Scnn_3_R2.sam.row_ME Round1/Scnn_3_R2.sam.row_ME.Genome.Aligned.out.sam data/GT_AG_U2_5.pwm data/GT_AG_U2_3.pwm /data/resources/mouse/PhastCons/mm10.60way.phastCons.bw 30 > Round1/Scnn_3_R2.sam.row_ME.filter1 '
I can see that snakemake successfully created the environment from pybedtools.yaml
, you can try to activate this environment by using:
conda activate /data/MicroExonator/.snakemake/conda/f2d123d5
This should have all the dependencies installed.
What really caught my attention is that then snakemake tries to run the script with python3:
python3 src/ME_filter1.py /data/resources/mouse/genome/GRCm38.p6.genome.fa Round1/Scnn_3_R2.sam.row_ME Round1/Scnn_3_R2.sam.row_ME.Genome.Aligned.out.sam data/GT_AG_U2_5.pwm data/GT_AG_U2_3.pwm /data/resources/mouse/PhastCons/mm10.60way.phastCons.bw 30 > Round1/Scnn_3_R2.sam.row_ME.filter1
This should be run with python2, I am not sure why in your run python2 is not correctly called here. I will update conda in my machine and see if I also get this error. If this keeps causing problems, I can update all the code to python3, but it will take me a while. Could you perform a snakemake dry-run with -np
and check if the command is generated for this step is using python3 instead of python2?
Thanks for reporting this, Guillermo
Hey,
thanks for the reply. I just reinstalled everything, removed Anaconda which was also installed, but I still get the following error messages:
Error in rule hisat2_Genome_index: rule download_fastq: input: download/VIP_W4_R2.download.sh output: FASTQ/VIP_W4_R2.fastq jobid: 800 wildcards: sample=VIP_W4_R2 priority: -10 resources: get_data=1 /usr/bin/bash: activate: No such file or directory jobid: 591
and later:
/usr/bin/bash: activate: No such file or directory RuleException: CalledProcessError in line 25 of /data/MicroExonator/rules/Round1_post_processing.skm: Command 'source activate /data/MicroExonator/.snakemake/conda/95f9e5c7; set -euo pipefail; hisat2-build /data/resources/mouse/genome/GRCm38.p6.genome.fa data/Genome ' returned non-zero exit status 127. File "/data/MicroExonator/rules/Round1_post_processing.skm", line 25, in __rule_hisat2_Genome_index File "/home/stephan/programs/miniconda3/envs/snakemake_env/lib/python3.6/concurrent/futures/thread.py", line 56, in run RuleException: CalledProcessError in line 31 of /data/MicroExonator/rules/Round2_post_processing.skm: Command 'source activate /data/MicroExonator/.snakemake/conda/95f9e5c7; set -euo pipefail; bowtie-build data/Genome data/Genome ' returned non-zero exit status 127. File "/data/MicroExonator/rules/Round2_post_processing.skm", line 31, in __rule_bowtie_Genome_index File "/home/stephan/programs/miniconda3/envs/snakemake_env/lib/python3.6/concurrent/futures/thread.py", line 56, in run
I used the same config file as before and ran the command:
snakemake -s MicroExonator.skm --use-conda -k -j 32
Everything is installed according to the provided manual. When I run the test command by adding the -np flag everything works without an error. Is there a way to fix this?
Best, Stephan
Sorry for the delay,
I think this is a bug with the Optimize_hard_drive : T
feature. Now that we finally published MicroExonator on Genome Biology, I am going to be addressing the issues more actively. Please let me know if switching this to Optimize_hard_drive : F
solves the issue so far (delete this line may also work).
I will close this issue once I manage to fix Optimize_hard_drive : T
. I have now made a lot of changes to optimise the disk usage space in other ways, so this feature is not that relevant anymore, but I will still try to fix this soon.
Best, Guillermo