katuali
katuali copied to clipboard
error at test after installation of required dependencies
I have installed katuali as per instructions and I now trying to get the pipeline running.
I have installed the required dependencies (except for guppy
, which was installed via the .deb
package) as follows:
- created a conda environment
- installed the dependencies pomoxis, canu, flye and medaka with conda
- copied the content from the conda installation to the directories suggested by katuali (see below)
(katuali) bhinckel@Brightcore-testsrv:~/downloads/katuali$ ll ~/git/pomoxis/venv/bin/activate
total 12
drwxr-xr-x 3 bhinckel 17930 4096 Jul 16 15:09 ../
drwxr-xr-x 5 bhinckel 17930 4096 Jul 16 15:15 pomoxis-0.3.4-py_0/
drwxr-xr-x 3 bhinckel 17930 4096 Jul 16 15:15 ./
(katuali) bhinckel@Brightcore-testsrv:~/downloads/katuali$ ll /usr/bin/guppy_basecaller
lrwxrwxrwx 1 root root 35 Jul 16 13:49 /usr/bin/guppy_basecaller -> /opt/ont/guppy/bin/guppy_basecaller*
(katuali) bhinckel@Brightcore-testsrv:~/downloads/katuali$ ll ~/git/canu-1.8/Linux-amd64/bin/canu
lrwxrwxrwx 1 bhinckel 17930 47 Jul 16 17:17 /home/bhinckel/git/canu-1.8/Linux-amd64/bin/canu -> /home/bhinckel/miniconda3/envs/pomoxis/bin/canu*
(katuali) bhinckel@Brightcore-testsrv:~/downloads/katuali$ ll ~/git/Flye/bin/flye
lrwxrwxrwx 1 bhinckel 17930 47 Jul 16 17:21 /home/bhinckel/git/Flye/bin/flye -> /home/bhinckel/miniconda3/envs/pomoxis/bin/flye*
(katuali) bhinckel@Brightcore-testsrv:~/downloads/katuali$ ll ~/git/medaka/venv/bin/activate
total 12
drwxr-xr-x 3 bhinckel 17930 4096 Jul 16 17:32 ../
drwxr-xr-x 3 bhinckel 17930 4096 Jul 16 17:33 ./
drwxr-xr-x 5 bhinckel 17930 4096 Jul 16 17:33 medaka-0.11.5-py36h148d290_0/
(katuali) bhinckel@Brightcore-testsrv:~/downloads/katuali$
When I run (katuali) bhinckel@Brightcore-testsrv:~/downloads/katuali$ make test
, the snakemake is launched, and I see that guppy was running, though I get an error (printed to the screen) after some time:
/bin/bash: line 24: source: /home/bhinckel/git/pomoxis/venv/bin/activate: is a directory
[Fri Jul 17 11:26:19 2020]
Error in rule basecall_guppy:
jobid: 0
output: MinIonRun1/guppy/basecalls.fasta, MinIonRun1/guppy/sequencing_summary.txt
log: MinIonRun1/guppy.log (check log file(s) for error message)
shell:
check_files_exist --quiet /usr/bin/guppy_basecaller MinIonRun1/reads /home/bhinckel/git/pomoxis/venv/bin/activate &> MinIonRun1/guppy.log
# snakemake will create the output dir, guppy will fail if it exists..
rm -r MinIonRun1/guppy
echo "GPU status before" >> MinIonRun1/guppy.log
gpustat >> MinIonRun1/guppy.log
sleep $(((RANDOM % 30) + 1 ))
GPU=$(pick_gpu 2>> MinIonRun1/guppy.log)
echo "Runnning on host $HOSTNAME GPU $GPU" >> MinIonRun1/guppy.log
/usr/bin/guppy_basecaller -s MinIonRun1/guppy -r -i MinIonRun1/reads -c dna_r9.4.1_450bps_hac.cfg &>> MinIonRun1/guppy.log
echo "gpustat after" >> MinIonRun1/guppy.log
gpustat >> MinIonRun1/guppy.log
# convert fastq to fasta
sleep 5
echo "Combining the following fastq files into MinIonRun1/guppy/basecalls.fasta" >> MinIonRun1/guppy.log
ls MinIonRun1/guppy/*.fastq >> MinIonRun1/guppy.log
set +u; source /home/bhinckel/git/pomoxis/venv/bin/activate; set -u;
seqkit fq2fa MinIonRun1/guppy/*.fastq > MinIonRun1/guppy/basecalls.fasta
rm MinIonRun1/guppy/*.fastq
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
Removing output files of failed job basecall_guppy since they might be corrupted:
MinIonRun1/guppy/sequencing_summary.txt
Job failed, going on with independent jobs.
Exiting because a job execution failed. Look above for error message
Complete log: /home/bhinckel/downloads/katuali/test/.snakemake/log/2020-07-17T111026.492659.snakemake.log
Makefile:56: recipe for target 'test_basecall' failed
make: *** [test_basecall] Error 1
What is strange is that the basecall appears to have been finished successfully (the message below is the last lines of the file /home/bhinckel/downloads/katuali/test/MinIonRun1/guppy/guppy_basecaller_log-2020-07-17_11-10-37.log
2020-07-17 11:26:12.948457 [guppy/message] Caller time: 929580 ms, Samples called: 14237088, samples/s: 15315.6
2020-07-17 11:26:12.948602 [guppy/message] Finishing up any open output files.
2020-07-17 11:26:14.467721 [guppy/message] Basecalling completed successfully.
In fact the fastq was generated (/home/bhinckel/downloads/katuali/test/MinIonRun1/guppy/fastq_runid_f1d7aa40eb01e7882b06a486c721890952f0f34a_0_0.fastq
), so I guess the error occurred at the generation of the .fasta
from the .fastq
.
From the error message it appears that this should be accomplished by the command seqkit fq2fa MinIonRun1/guppy/*.fastq > MinIonRun1/guppy/basecalls.fasta
. In the master config file (/home/bhinckel/downloads/katuali/build/lib/katuali/data/config.yaml
) I do not see where seqkit
should be installed, and in fact I noticed that it is not installed on my system.
Is this error indeed being caused by the absence of seqkit
? if so, where should I install it? If not, what could be the cause of the problem?
Also, is there any plan to launch a conda package for katuali?
Hi @BCArg,
seqkit should be installed as part of pomoxis. If you were able to start the basecall_guppy rule, you must have a functional pomoxis venv, as the venv/bin/activation script is a required input to the basecall_guppy rule. Check if seqkit installed ok, otherwise it can be obtained here - you should copy the executable into your pomoxis virtual env bin directory.
There are currently no plans to launch a conda package for katuali. We may at some point look at using Snakemake's conda integration, which would allow snakemake to automatically create rule-specific conda environments using environment files that could be distributed with katuali.
We'd happily accept a pull request if you'd like to look at this.
thanks for the quick reply.
I did manually install seqkit
and copied the executable to /usr/local/bin
, according to the instructions
Also following your suggestion I copied seqkit
to /home/bhinckel/git/pomoxis/venv/bin/
, as shown below:
(katuali) bhinckel@Brightcore-testsrv:~/downloads/katuali$ ll /home/bhinckel/git/pomoxis/venv/bin/
total 15236
drwxr-xr-x 3 bhinckel 17930 4096 Jul 16 15:09 ../
drwxr-xr-x 3 bhinckel 17930 4096 Jul 16 15:15 activate/
drwxr-xr-x 3 bhinckel 17930 4096 Jul 17 12:49 ./
-rwxr-xr-x 1 bhinckel 17930 15589376 Jul 17 12:49 seqkit*
(katuali) bhinckel@Brightcore-testsrv:~/downloads/katuali$ which seqkit
/usr/local/bin/seqkit
(katuali) bhinckel@Brightcore-testsrv:~/downloads/katuali$
Though I am still getting the same error. I am now sure that the error lies at the seqkit
step, as all the previous command ran well, as show in guppy
log file (below). I have also restarted the shell, though it did not help
GPU status before
Brightcore-testsrv Fri Jul 17 12:50:17 2020 390.138
[0] Tesla C2050 | 57'C, 0 % | 373 / 2615 MB | gdm(65M) gdm(65M) colsen(16M) colsen(118M)
[12:50:35 - pick_gpu] SGE_HGR_gpu was not set, setting GPU to 0 based on memory and utilization
Runnning on host Brightcore-testsrv GPU 0
ONT Guppy basecalling software version 4.0.11+f1071ceb, client-server API version 2.1.0
config file: /opt/ont/guppy/data/dna_r9.4.1_450bps_hac.cfg
model file: /opt/ont/guppy/data/template_r9.4.1_450bps_hac.jsn
input path: MinIonRun1/reads
save path: MinIonRun1/guppy
chunk size: 2000
chunks per runner: 512
records per file: 4000
num basecallers: 1
cpu mode: ON
threads per caller: 4
Found 162 fast5 files to process.
Init time: 4591 ms
0% 10 20 30 40 50 60 70 80 90 100%
|----|----|----|----|----|----|----|----|----|----|
***************************************************
Caller time: 1003176 ms, Samples called: 14237088, samples/s: 14192
Finishing up any open output files.
Basecalling completed successfully.
gpustat after
Brightcore-testsrv Fri Jul 17 13:07:25 2020 390.138
[0] Tesla C2050 | 58'C, 0 % | 372 / 2615 MB | gdm(65M) gdm(65M) colsen(16M) colsen(118M)
Combining the following fastq files into MinIonRun1/guppy/basecalls.fasta
MinIonRun1/guppy/fastq_runid_f1d7aa40eb01e7882b06a486c721890952f0f34a_0_0.fastq
I just called seqkit fq2fa MinIonRun1/guppy/*.fastq
from the command line and it works.
I have also put the directories under ~/git/pomoxis/venv/bin/activate/pomoxis-0.3.4-py_0
directly under ~/git/pomoxis/venv/bin/activate/
i.e.
(katuali) bhinckel@Brightcore-testsrv:~$ ll /home/bhinckel/git/pomoxis/venv/bin/activate/
total 20
drwxr-xr-x 3 bhinckel 17930 4096 Jul 17 12:49 ../
drwxr-xr-x 2 bhinckel 17930 4096 Jul 17 14:56 python-scripts/
drwxr-xr-x 5 bhinckel 17930 4096 Jul 17 14:56 info/
drwxr-xr-x 4 bhinckel 17930 4096 Jul 17 14:56 site-packages/
drwxr-xr-x 5 bhinckel 17930 4096 Jul 17 14:57 ./
Though this had no effect on the error.
Please also note that the github and the documentation are inconsistent. The former says that one needs scrappie, pomoxis, medaka, and nanopolish whereas the later says that one needs suppy (probably a typo for guppy), pomoxis, canu, flye and medaka for successfully running the tests
Just wondering if any progress was made to figure out this error, as I am having the exact same issues and I am not really sure what can be done at this point.