MicroExonator icon indicating copy to clipboard operation
MicroExonator copied to clipboard

ME_filter1.py is not working

Open s-weissbach opened this issue 4 years ago • 3 comments

Hello, I try to run MicroExonator on paired-end RNA-seq data. I can't resolve the issue with ME_filter1.py. I did everything as descriped in the manual, my config.yaml file: Genome_fasta : /data/resources/mouse/genome/GRCm38.p6.genome.fa Gene_anontation_bed12 : /data/resources/mouse/genome/mm10_UCSC_knownGene.bed GT_AG_U2_5 : /data/MicroExonator/PWM/Mouse/mm10_GT_AG_U2_5.good.matrix GT_AG_U2_3 : /data/MicroExonator/PWM/Mouse/mm10_GT_AG_U2_3.good.matrix conservation_bigwig : /data/resources/mouse/PhastCons/mm10.60way.phastCons.bw working_directory : /data/MicroExonator ME_len : 30 Optimize_hard_drive : T min_number_files_detected : 3 paired_samples : /data/MicroExonator/paired_samples.txt Then I started with snakemake -s MicroExonator.skm --use-conda -k -j 32 Which led to more or less the same error for every single input file: Error in rule Round1_filter: jobid: 159 RuleException: CalledProcessError in line 56 of /data/MicroExonator/rules/Round1_post_processing.skm: Command 'source activate /data/MicroExonator/.snakemake/conda/f2d123d5; set -euo pipefail; python3 src/ME_filter1.py /data/resources/mouse/genome/GRCm38.p6.genome.fa Round1/Scnn_3_R2.sam.row_ME Round1/Scnn_3_R2.sam.row_ME.Genome.Aligned.out.sam data/GT_AG_U2_5.pwm data/GT_AG_U2_3.pwm /data/resources/mouse/PhastCons/mm10.60way.phastCons.bw 30 > Round1/Scnn_3_R2.sam.row_ME.filter1 ' returned non-zero exit status 1. File "/data/MicroExonator/rules/Round1_post_processing.skm", line 56, in __rule_Round1_filter File "/home/stephan/anaconda3/envs/snakemake_env/lib/python3.6/concurrent/futures/thread.py", line 56, in run output: Round1/Scnn_1_R2.sam.row_ME.filter1 Removing output files of failed job Round1_filter since they might be corrupted: Round1/Scnn_3_R2.sam.row_ME.filter1 conda-env: /data/MicroExonator/.snakemake/conda/f2d123d5 Job failed, going on with independent jobs.

I figured out, that I had not installed all dependencies since they weren't given in the installation. I installed them manually following the import statetments in ME_filter1.py. Since Biopython is not longer supported for Python2, I switched everything to Python3 and removed the not working print statements from the script. Now, I can get a few lines of output, by manually calling ME_filter1.py, but at some point the script will crash with: python3 src/ME_filter1.py /data/resources/mouse/genome/GRCm38.p6.genome.fa Round1/SST_1_R1.sam.row_ME Round1/SST_1_R1.sam.row_ME.Genome.Aligned.out.sam data/GT_AG_U2_5.pwm data/GT_AG_U2_3.pwm /data/resources/mouse/PhastCons/mm10.60way.phastCons.bw 30 Working output: D00535:22:CBLNNANXX:2:2202:18520:19601 CGCCAGCCAGAGCAGGCCCGCCGGCCCCTCAGTGTTGCCACAGACAACATGATGCTGGAGTTTTACAAGAAGGATGGCCTTAGGAAAATCCAAAGCATGGG GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGBBB@? chr11:65012098-65021940|uc007jku.1|100_100|76M18I7M 101 chr11|-|65012906|6S18M9016N77M 95 True 18 CCTTAGGAAAATCCAAAG 1 78.9604235156872 1.0 chr11_-_65012906_65012924 78.9604235156872 1.0 [...] and then the error: Traceback (most recent call last): File "src/ME_filter1.py", line 372, in <module> main(sys.argv[2], sys.argv[3], sys.argv[4], sys.argv[5], sys.argv[6], int(sys.argv[7])) File "src/ME_filter1.py", line 364, in main print(read, seq, qual, tag_alingment, t_score, genome_alingment, g_score, same_ME, len(DR_corrected_micro_exon_seq_found), DR_corrected_micro_exon_seq_found, len(micro_exons), max(U2_scores), max(TOTAL_mean_conservation), micro_exons_coords, ",".join(map(str, U2_scores)), ",".join(map(str, TOTAL_mean_conservation))) TypeError: '>' not supported between instances of 'NoneType' and 'float' How can I fix this? Can you please provide a full list of needed dependencies?

s-weissbach avatar Aug 31 '20 10:08 s-weissbach

Hello,

Dependences are solved automatically by snakemake when you run it with --use-conda, this is why I am not providing the list of dependencies in the documentation. With this flag, snakemake creates the conda environments dynamically from the YAML files that are located at MicroExonator/envs/. In the particular case of ME_filter1.py uses pybedtools.yaml, take a look at this file and you will know which are the dependencies.

Looking at:

'source activate /data/MicroExonator/.snakemake/conda/f2d123d5; set -euo pipefail; python3 src/ME_filter1.py /data/resources/mouse/genome/GRCm38.p6.genome.fa Round1/Scnn_3_R2.sam.row_ME Round1/Scnn_3_R2.sam.row_ME.Genome.Aligned.out.sam data/GT_AG_U2_5.pwm data/GT_AG_U2_3.pwm /data/resources/mouse/PhastCons/mm10.60way.phastCons.bw 30 > Round1/Scnn_3_R2.sam.row_ME.filter1 '

I can see that snakemake successfully created the environment from pybedtools.yaml, you can try to activate this environment by using:

conda activate /data/MicroExonator/.snakemake/conda/f2d123d5

This should have all the dependencies installed.

What really caught my attention is that then snakemake tries to run the script with python3:

python3 src/ME_filter1.py /data/resources/mouse/genome/GRCm38.p6.genome.fa Round1/Scnn_3_R2.sam.row_ME Round1/Scnn_3_R2.sam.row_ME.Genome.Aligned.out.sam data/GT_AG_U2_5.pwm data/GT_AG_U2_3.pwm /data/resources/mouse/PhastCons/mm10.60way.phastCons.bw 30 > Round1/Scnn_3_R2.sam.row_ME.filter1

This should be run with python2, I am not sure why in your run python2 is not correctly called here. I will update conda in my machine and see if I also get this error. If this keeps causing problems, I can update all the code to python3, but it will take me a while. Could you perform a snakemake dry-run with -np and check if the command is generated for this step is using python3 instead of python2?

Thanks for reporting this, Guillermo

geparada avatar Sep 06 '20 11:09 geparada

Hey,

thanks for the reply. I just reinstalled everything, removed Anaconda which was also installed, but I still get the following error messages: Error in rule hisat2_Genome_index: rule download_fastq: input: download/VIP_W4_R2.download.sh output: FASTQ/VIP_W4_R2.fastq jobid: 800 wildcards: sample=VIP_W4_R2 priority: -10 resources: get_data=1 /usr/bin/bash: activate: No such file or directory jobid: 591 and later: /usr/bin/bash: activate: No such file or directory RuleException: CalledProcessError in line 25 of /data/MicroExonator/rules/Round1_post_processing.skm: Command 'source activate /data/MicroExonator/.snakemake/conda/95f9e5c7; set -euo pipefail; hisat2-build /data/resources/mouse/genome/GRCm38.p6.genome.fa data/Genome ' returned non-zero exit status 127. File "/data/MicroExonator/rules/Round1_post_processing.skm", line 25, in __rule_hisat2_Genome_index File "/home/stephan/programs/miniconda3/envs/snakemake_env/lib/python3.6/concurrent/futures/thread.py", line 56, in run RuleException: CalledProcessError in line 31 of /data/MicroExonator/rules/Round2_post_processing.skm: Command 'source activate /data/MicroExonator/.snakemake/conda/95f9e5c7; set -euo pipefail; bowtie-build data/Genome data/Genome ' returned non-zero exit status 127. File "/data/MicroExonator/rules/Round2_post_processing.skm", line 31, in __rule_bowtie_Genome_index File "/home/stephan/programs/miniconda3/envs/snakemake_env/lib/python3.6/concurrent/futures/thread.py", line 56, in run

I used the same config file as before and ran the command: snakemake -s MicroExonator.skm --use-conda -k -j 32

Everything is installed according to the provided manual. When I run the test command by adding the -np flag everything works without an error. Is there a way to fix this?

Best, Stephan

s-weissbach avatar Sep 09 '20 08:09 s-weissbach

Sorry for the delay,

I think this is a bug with the Optimize_hard_drive : T feature. Now that we finally published MicroExonator on Genome Biology, I am going to be addressing the issues more actively. Please let me know if switching this to Optimize_hard_drive : F solves the issue so far (delete this line may also work).

I will close this issue once I manage to fix Optimize_hard_drive : T. I have now made a lot of changes to optimise the disk usage space in other ways, so this feature is not that relevant anymore, but I will still try to fix this soon.

Best, Guillermo

geparada avatar Jan 28 '21 03:01 geparada