EukDetect
EukDetect copied to clipboard
eukdetect test fails at aln stage
Hello - I followed the conda install instructions and edited the yml config file. The test fails at aln stage and the aln folder in the output directory is empty.
Here is the yml file: #Config file for testing eukdetect. Edit your paths where specified
eukdetect_dir: "/storage/home/epb5360/scratch/EukDetect" output_dir: "/storage/home/epb5360/scratch/EukDetect/results" #directory where output should be written
paired_end: true #true or false
fwd_suffix: "_R1.fastq.gz" #filename excluding sample name. no need to edit if paired_end= false rev_suffix: "_R2.fastq.gz" #filename excludign sample name. no need to edit if paired_end = false se_suffix: ".fastq.gz" #file name excluding sample name. no need to edit if paired_end = true readlen: 125 #targeted length of your reads. pre-trimming reads not recommended
fq_dir: "/storage/home/epb5360/scratch/EukDetect/tests" #full path to directory with raw fastq files database_dir: "/storage/home/epb5360/scratch/EukDetect/eukdb" #full path to folder with all eukdetect_db files and taxa.sqlite files database_prefix: "ncbi_eukprot_met_arch_markers.fna" #database prefix
samples: #list sample names here. fastqs must correspond to {samplename}{se_suffix} for SE reads or {samplename}{fwd_suffix} and {samplename}{rev_suffix} for PE test:
Here is the snakemake error log: $ cat snakemake_1616682088.517935.log Building DAG of jobs... Using shell: /usr/bin/bash Provided cores: 1 Rules claiming more threads will be scaled down. Job counts: count jobs 1 bam2fastq 1 countreads 1 find_low_complexity 1 fixmate 1 index 1 markdup 1 remove_low_complexity 1 rmsort 1 runall 1 runaln 1 taxonomize 11
[Thu Mar 25 10:21:28 2021] rule runaln: input: /storage/home/epb5360/scratch/EukDetect/eukdb/ncbi_eukprot_met_arch_markers.fna, /storage/home/epb5360/scratch/EukDetect/tests/test_R1.fastq.gz, /storage/home/epb5360/scratch/EukDetect/tests/test_R2.fastq.gz output: /storage/home/epb5360/scratch/EukDetect/results/aln/test_aln_q30_lenfilter.sorted.bam jobid: 1 wildcards: output_dir=/storage/home/epb5360/scratch/EukDetect/results, sample=test
Job counts: count jobs 1 runaln 1 open: No such file or directory [bam_sort_core] fail to open file /storage/home/epb5360/scratch/EukDetect/results/aln/test_aln_q30_lenfilter.sorted.bam [samopen] SAM header is present: 521824 sequences. [Thu Mar 25 10:21:35 2021] Error in rule runaln: jobid: 0 output: /storage/home/epb5360/scratch/EukDetect/results/aln/test_aln_q30_lenfilter.sorted.bam
RuleException: CalledProcessError in line 78 of /storage/home/epb5360/scratch/EukDetect/rules/eukdetect.rules: Command ' set -euo pipefail; bowtie2 --quiet --omit-sec-seq --no-discordant --no-unal -x /storage/home/epb5360/scratch/EukDetect/eukdb/ncbi_eukprot_met_arch_markers.fna -1 /storage/home/epb5360/scratch/EukDetect/tests/test_R1.fastq.gz -2 /storage/home/epb5360/scratch/EukDetect/tests/test_R2.fastq.gz | perl -lane '$l = 0; $F[5] =~ s/(\d+)[MX=DN]/$l+=$1/eg; print if $l > 100.0 or /^@/' | samtools view -q 30 -bS - | samtools sort -o /storage/home/epb5360/scratch/EukDetect/results/aln/test_aln_q30_lenfilter.sorted.bam - ' returned non-zero exit status 141. File "/storage/home/epb5360/scratch/EukDetect/rules/eukdetect.rules", line 78, in __rule_runaln File "/storage/work/epb5360/miniconda3/envs/eukdetect/lib/python3.6/concurrent/futures/thread.py", line 56, in run Exiting because a job execution failed. Look above for error message Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Complete log: /gpfs/scratch/epb5360/EukDetect/.snakemake/log/2021-03-25T102128.731736.snakemake.log
Thanks for reaching out. Can you try running just the command itself, that is:
bowtie2 --quiet --omit-sec-seq --no-discordant --no-unal -x /storage/home/epb5360/scratch/EukDetect/eukdb/ncbi_eukprot_met_arch_markers.fna -1 /storage/home/epb5360/scratch/EukDetect/tests/test_R1.fastq.gz -2 /storage/home/epb5360/scratch/EukDetect/tests/test_R2.fastq.gz | perl -lane '$l = 0; $F[5] =~ s/(\d+)[MX=DN]/$l+=$1/eg; print if $l > 100.0 or /^@/' | samtools view -q 30 -bS - | samtools sort -o /storage/home/epb5360/scratch/EukDetect/results/aln/test_aln_q30_lenfilter.sorted.bam -
and seeing if that causes any errors, and post them here? Thank you.
Thanks for the quick reply! That ran with no errors, and in the 'results' folder there is this file: test_aln_q30_lenfilter.sorted.bam
Great, thanks. Could you share your config file?
I should mention that I'm working on an HPC cluster, not sure if that is part of the install problem as I've had issues with bowtie2 when installing metaphlan and humann
$ cat configfile_for_tests.yml #Config file for testing eukdetect. Edit your paths where specified
eukdetect_dir: "/storage/home/epb5360/scratch/EukDetect" output_dir: "/storage/home/epb5360/scratch/EukDetect/results" #directory where output should be written
paired_end: true #true or false
fwd_suffix: "_R1.fastq.gz" #filename excluding sample name. no need to edit if paired_end= false rev_suffix: "_R2.fastq.gz" #filename excludign sample name. no need to edit if paired_end = false se_suffix: ".fastq.gz" #file name excluding sample name. no need to edit if paired_end = true readlen: 125 #targeted length of your reads. pre-trimming reads not recommended
fq_dir: "/storage/home/epb5360/scratch/EukDetect/tests" #full path to directory with raw fastq files database_dir: "/storage/home/epb5360/scratch/EukDetect/eukdb" #full path to folder with all eukdetect_db files and taxa.sqlite files database_prefix: "ncbi_eukprot_met_arch_markers.fna" #database prefix
samples: #list sample names here. fastqs must correspond to {samplename}{se_suffix} for SE reads or {samplename}{fwd_suffix} and {samplename}{rev_suffix} for PE test:
I see, okay. It sounds like this problem might not be coming from eukdetect itself. Are you running this as a snakemake pipeline and submitting that as a job to the cluster? When you run your job on the cluster do you have a step where you run "conda activate eukdetect" (or whatever you've named the environment)?
Sorry for the delayed reply - I re-visited this after fixing the bowtie issue in other softwares but still having issues here. I'm running it locally right now in the eukdetect conda environment
In other news, I tried it on my local machine and am running into issues with the conda environment:
$ conda env update --name eukdetect -f environment.yml Collecting package metadata (repodata.json): done Solving environment: failed
ResolvePackageNotFound:
- xorg-libxrender==0.9.10=h516909a_1002
- pillow==7.2.0=py36h8328e55_1
- libgomp==9.2.0=h24d8f2e_2
- xz==5.2.5=h516909a_1
- graphite2==1.3.13=he1b5a44_1001
- openssl==1.1.1g=h516909a_0
- lcms2==2.11=hbd6801e_0
- libstdcxx-ng==9.2.0=hdf63c60_2
- libtool==2.4.6=h14c3975_1002
- expat==2.2.9=he1b5a44_2
- pyqt==5.12.3=py36haa643ae_3
- libxslt==1.1.33=h31b3aaa_0
- pygraphviz==1.5=py36h8c4c3a4_1002
- pysam==0.16.0.1=py36h4c34d4e_1
- pynacl==1.3.0=py36h516909a_1001
- pango==1.40.14=he7ab937_1005
- htslib==1.10.2=hd3b49d5_1
- libpng==1.6.37=hed695b0_1
- pyrsistent==0.16.0=py36h8c4c3a4_0
- dbus==1.13.6=he372182_0
- reportlab==3.5.44=py36hcce1d1f_0
- protobuf==3.12.3=py36h831f99a_0
- lxml==4.5.2=py36h17c4326_0
- libedit==3.1.20191231=h46ee950_1
- libtiff==4.1.0=hc7e4089_6
- pandas==0.24.2=py36hf484d3e_0
- cairo==1.16.0=hcf35c78_1003
- xorg-libsm==1.2.3=h84519dc_1000
- xorg-libice==1.0.10=h516909a_0
- xorg-kbproto==1.0.7=h14c3975_1002
- gstreamer==1.14.5=h36ae1b5_2
- yarl==1.4.2=py36h516909a_0
- krb5==1.17.1=hfafb76e_1
- multidict==4.7.5=py36h8c4c3a4_1
- bzip2==1.0.8=h516909a_2
- libprotobuf==3.12.3=h8b12597_1
- xorg-libxau==1.0.9=h14c3975_0
- libuuid==2.32.1=h14c3975_1000
- bcrypt==3.1.7=py36h8c4c3a4_1
- xorg-xproto==7.0.31=h14c3975_1007
- libxcb==1.13=h14c3975_1002
- markupsafe==1.1.1=py36h8c4c3a4_1
- aiohttp==3.6.2=py36h516909a_0
- libffi==3.2.1=he1b5a44_1007
- _openmp_mutex==4.5=0_gnu
- libiconv==1.15=h516909a_1006
- libwebp-base==1.1.0=h516909a_3
- libgfortran-ng=7.5.0
- ncurses==6.1=hf484d3e_1002
- cffi==1.14.0=py36hd463f26_0
- msgpack-python==1.0.0=py36hdb11119_1
- gst-plugins-base==1.14.5=h0935bb2_2
- _libgcc_mutex==0.1=conda_forge
- harfbuzz==2.4.0=h9f30f68_3
- nss==3.47=he751ad9_0
- freetype==2.10.2=he06d7ca_0
- xorg-libxt==1.1.5=h516909a_1003
- ld_impl_linux-64==2.34=h53a641e_7
- xorg-libx11==1.6.9=h516909a_0
- pcre==8.44=he1b5a44_0
- wrapt==1.12.1=py36h8c4c3a4_1
- icu==64.2=he1b5a44_1
- fontconfig==2.13.1=h86ecdb6_1001
- readline==8.0=h46ee950_1
- libcurl==7.71.1=hcdd3856_1
- libxml2==2.9.10=hee79883_0
- xorg-libxdmcp==1.1.3=h516909a_0
- openblas==0.3.3=h9ac9557_1001
- bowtie2==2.4.1=py36h7f0b59b_2
- tk==8.6.10=hed695b0_0
- zlib==1.2.11=h516909a_1006
- qt==5.12.5=hd8c4c69_1
- glib==2.65.0=h6f030ca_0
- tbb==2020.1=hc9558a2_0
- libxkbcommon==0.10.0=he1b5a44_0
- libgcc==7.2.0=h69d50b8_2
- samtools==1.10=h9402c20_2
- brotlipy==0.7.0=py36h8c4c3a4_1000
- libclang==9.0.1=default_hde54327_0
- libssh2==1.9.0=hab1572f_3
- pixman==0.38.0=h516909a_1003
- jpeg==9d=h516909a_0
- graphviz==2.38.0=hf68f40c_1011
- scipy==1.2.1=py36_blas_openblash1522bff_0
- xorg-libxext==1.3.4=h516909a_0
- komplexity==0.3.6=musl
- libdeflate==1.6=h516909a_0
- xorg-libxpm==3.5.13=h516909a_0
- psutil==5.7.0=py36h8c4c3a4_1
- gettext==0.19.8.1=hc5be6a0_1002
- xorg-xextproto==7.3.0=h14c3975_1002
- nspr==4.26=he1b5a44_0
- pyyaml==5.3.1=py36h8c4c3a4_0
- cryptography==2.9.2=py36h45558ae_0
- libgcc-ng==9.2.0=h24d8f2e_2
- numpy==1.12.1=py36_blas_openblash1522bff_1001
- datrie==0.8.2=py36h8c4c3a4_0
- yaml==0.2.5=h516909a_0
- lz4-c==1.9.2=he1b5a44_1
- libllvm9==9.0.1=he513fc3_1
- sqlite==3.32.3=hcee41ef_1
- xorg-renderproto==0.11.1=h14c3975_1002
- zstd==1.4.4=h6597ccf_3
- bedtools==2.29.2=hc088bd4_0
- perl==5.26.2=h516909a_1006
- pthread-stubs==0.4=h14c3975_1001
- python==3.6.10=h8356626_1011_cpython
It's possible this is coming from an issue with versioning between operating system channels on conda. Is your local machine running a different operating system than Linux? Eukdetect has not been tested on OSX or Windows.
I've done more troubleshooting on this: I ran each command in the eukdetect.rules file by hand and they ran successfully, so the problem is somewhere in the snakemake process? Here is the snakemake log, it fails on the first step:
$ snakemake --snakefile rules/eukdetect.rules --configfile config.yml --cores 4 aln Building DAG of jobs... Using shell: /usr/bin/bash Provided cores: 1 Rules claiming more threads will be scaled down. Job counts: count jobs 1 aln 1 runaln 2
[Tue Sep 14 15:20:56 2021] rule runaln: input: /gpfs/group/evk5387/default/emily/EukDetect/eukdb/ncbi_eukprot_met_arch_markers.fna, /gpfs/group/evk5387/default/emily/EukDetect/tests/test_R1.fastq.gz, /gpfs/group/evk5387/default/emily/EukDetect/tests/test_R2.fastq.gz output: /gpfs/group/evk5387/default/emily/EukDetect/retest/aln/test_aln_q30_lenfilter.sorted.bam jobid: 1 wildcards: output_dir=/gpfs/group/evk5387/default/emily/EukDetect/retest, sample=test
Job counts: count jobs 1 runaln 1 open: No such file or directory [bam_sort_core] fail to open file /gpfs/group/evk5387/default/emily/EukDetect/retest/aln/test_aln_q30_lenfilter.sorted.bam [samopen] SAM header is present: 521824 sequences. [Tue Sep 14 15:21:02 2021] Error in rule runaln: jobid: 0 output: /gpfs/group/evk5387/default/emily/EukDetect/retest/aln/test_aln_q30_lenfilter.sorted.bam
RuleException: CalledProcessError in line 78 of /gpfs/group/evk5387/default/emily/EukDetect/rules/eukdetect.rules: Command ' set -euo pipefail; bowtie2 --quiet --omit-sec-seq --no-discordant --no-unal -x /gpfs/group/evk5387/default/emily/EukDetect/eukdb/ncbi_eukprot_met_arch_markers.fna -1 /gpfs/group/evk5387/default/emily/EukDetect/tests/test_R1.fastq.gz -2 /gpfs/group/evk5387/default/emily/EukDetect/tests/test_R2.fastq.gz | perl -lane '$l = 0; $F[5] =~ s/(\d+)[MX=DN]/$l+=$1/eg; print if $l > 101.0 or /^@/' | samtools view -q 30 -bS - | samtools sort -o /gpfs/group/evk5387/default/emily/EukDetect/retest/aln/test_aln_q30_lenfilter.sorted.bam - ' returned non-zero exit status 141. File "/gpfs/group/evk5387/default/emily/EukDetect/rules/eukdetect.rules", line 78, in __rule_runaln File "/storage/work/epb5360/miniconda3/envs/eukdetect/lib/python3.6/concurrent/futures/thread.py", line 56, in run Exiting because a job execution failed. Look above for error message Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Complete log: /gpfs/group/evk5387/default/emily/EukDetect/.snakemake/log/2021-09-14T152056.475674.snakemake.log
Could you post the most recent version of the config file you're using?
Here it is below (I'm using the test samples)
$ cat config.yml #Default config file for eukdetect. Copy and edit for analysis
#Directory where EukDetect output should be written output_dir: "/gpfs/group/evk5387/default/emily/EukDetect/retest"
#Indicate whether reads are paired (true) or single (false) paired_end: true
#filename excluding sample name. no need to edit if paired_end = false fwd_suffix: "_R1.fastq.gz"
#filename excludign sample name. no need to edit if paired_end = false rev_suffix: "_R2.fastq.gz"
#file name excluding sample name. no need to edit if paired_end = true se_suffix: ".fastq.gz"
#length of your reads. pre-trimming reads not recommended readlen: 126
#full path to directory with raw fastq files fq_dir: "/gpfs/group/evk5387/default/emily/EukDetect/tests"
#full path to folder with eukdetect database files database_dir: "/gpfs/group/evk5387/default/emily/EukDetect/eukdb"
#name of database. Default is original genomes only database name database_prefix: "ncbi_eukprot_met_arch_markers.fna"
#full path to eukdetect installation folder eukdetect_dir: "/gpfs/group/evk5387/default/emily/EukDetect"
#list sample names here. fastqs must correspond to {samplename}{se_suffix} for SE reads or {samplename}{fwd_suffix} and {samplename}{rev_suffix} for PE #each sample name should be preceded by 2 spaces and followed by a colon character samples: test:
Good news, I've figured out a workaround and have a better idea of what's happening on the cluster... snakemake is not inheriting the conda environment in the shell I'm executing it from. I was able to get everything running by adding conda activate eukdetect
to my bashrc file.
Hopefully there is a snakemake command to add to the config yaml to fix this?
Great! Thanks for the update. This is not likely something that will be added to the workflow. Do you have the option of submitting multi line commands to the cluster, or multi line job scripts, where you can add the line "conda activate eukdetect" before snakemake is called? This is how I work with conda environments on a cluster.