katuali icon indicating copy to clipboard operation
katuali copied to clipboard

error at test after installation of required dependencies

Open BCArg opened this issue 4 years ago • 4 comments

I have installed katuali as per instructions and I now trying to get the pipeline running.

I have installed the required dependencies (except for guppy, which was installed via the .deb package) as follows:

  1. created a conda environment
  2. installed the dependencies pomoxis, canu, flye and medaka with conda
  3. copied the content from the conda installation to the directories suggested by katuali (see below)
(katuali) bhinckel@Brightcore-testsrv:~/downloads/katuali$ ll ~/git/pomoxis/venv/bin/activate
total 12
drwxr-xr-x 3 bhinckel 17930 4096 Jul 16 15:09 ../
drwxr-xr-x 5 bhinckel 17930 4096 Jul 16 15:15 pomoxis-0.3.4-py_0/
drwxr-xr-x 3 bhinckel 17930 4096 Jul 16 15:15 ./
(katuali) bhinckel@Brightcore-testsrv:~/downloads/katuali$ ll /usr/bin/guppy_basecaller
lrwxrwxrwx 1 root root 35 Jul 16 13:49 /usr/bin/guppy_basecaller -> /opt/ont/guppy/bin/guppy_basecaller*
(katuali) bhinckel@Brightcore-testsrv:~/downloads/katuali$ ll ~/git/canu-1.8/Linux-amd64/bin/canu
lrwxrwxrwx 1 bhinckel 17930 47 Jul 16 17:17 /home/bhinckel/git/canu-1.8/Linux-amd64/bin/canu -> /home/bhinckel/miniconda3/envs/pomoxis/bin/canu*
(katuali) bhinckel@Brightcore-testsrv:~/downloads/katuali$ ll ~/git/Flye/bin/flye
lrwxrwxrwx 1 bhinckel 17930 47 Jul 16 17:21 /home/bhinckel/git/Flye/bin/flye -> /home/bhinckel/miniconda3/envs/pomoxis/bin/flye*
(katuali) bhinckel@Brightcore-testsrv:~/downloads/katuali$ ll ~/git/medaka/venv/bin/activate
total 12
drwxr-xr-x 3 bhinckel 17930 4096 Jul 16 17:32 ../
drwxr-xr-x 3 bhinckel 17930 4096 Jul 16 17:33 ./
drwxr-xr-x 5 bhinckel 17930 4096 Jul 16 17:33 medaka-0.11.5-py36h148d290_0/
(katuali) bhinckel@Brightcore-testsrv:~/downloads/katuali$ 

When I run (katuali) bhinckel@Brightcore-testsrv:~/downloads/katuali$ make test, the snakemake is launched, and I see that guppy was running, though I get an error (printed to the screen) after some time:

/bin/bash: line 24: source: /home/bhinckel/git/pomoxis/venv/bin/activate: is a directory
[Fri Jul 17 11:26:19 2020]
Error in rule basecall_guppy:
    jobid: 0
    output: MinIonRun1/guppy/basecalls.fasta, MinIonRun1/guppy/sequencing_summary.txt
    log: MinIonRun1/guppy.log (check log file(s) for error message)
    shell:
        
        check_files_exist --quiet /usr/bin/guppy_basecaller MinIonRun1/reads /home/bhinckel/git/pomoxis/venv/bin/activate &> MinIonRun1/guppy.log
        # snakemake will create the output dir, guppy will fail if it exists..
        rm -r MinIonRun1/guppy

        echo "GPU status before" >> MinIonRun1/guppy.log
        gpustat >> MinIonRun1/guppy.log

        sleep $(((RANDOM % 30)  + 1 ))

        GPU=$(pick_gpu 2>> MinIonRun1/guppy.log)

        echo "Runnning on host $HOSTNAME GPU $GPU" >> MinIonRun1/guppy.log

        /usr/bin/guppy_basecaller -s MinIonRun1/guppy -r -i MinIonRun1/reads -c dna_r9.4.1_450bps_hac.cfg &>> MinIonRun1/guppy.log

        echo "gpustat after" >> MinIonRun1/guppy.log
        gpustat >> MinIonRun1/guppy.log

        # convert fastq to fasta
        sleep 5
        echo "Combining the following fastq files into MinIonRun1/guppy/basecalls.fasta" >> MinIonRun1/guppy.log
        ls MinIonRun1/guppy/*.fastq >> MinIonRun1/guppy.log
        set +u; source /home/bhinckel/git/pomoxis/venv/bin/activate; set -u;
        seqkit fq2fa MinIonRun1/guppy/*.fastq > MinIonRun1/guppy/basecalls.fasta
        rm MinIonRun1/guppy/*.fastq
        
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Removing output files of failed job basecall_guppy since they might be corrupted:
MinIonRun1/guppy/sequencing_summary.txt
Job failed, going on with independent jobs.
Exiting because a job execution failed. Look above for error message
Complete log: /home/bhinckel/downloads/katuali/test/.snakemake/log/2020-07-17T111026.492659.snakemake.log
Makefile:56: recipe for target 'test_basecall' failed
make: *** [test_basecall] Error 1

What is strange is that the basecall appears to have been finished successfully (the message below is the last lines of the file /home/bhinckel/downloads/katuali/test/MinIonRun1/guppy/guppy_basecaller_log-2020-07-17_11-10-37.log

2020-07-17 11:26:12.948457 [guppy/message] Caller time: 929580 ms, Samples called: 14237088, samples/s: 15315.6
2020-07-17 11:26:12.948602 [guppy/message] Finishing up any open output files.
2020-07-17 11:26:14.467721 [guppy/message] Basecalling completed successfully.

In fact the fastq was generated (/home/bhinckel/downloads/katuali/test/MinIonRun1/guppy/fastq_runid_f1d7aa40eb01e7882b06a486c721890952f0f34a_0_0.fastq), so I guess the error occurred at the generation of the .fasta from the .fastq.

From the error message it appears that this should be accomplished by the command seqkit fq2fa MinIonRun1/guppy/*.fastq > MinIonRun1/guppy/basecalls.fasta. In the master config file (/home/bhinckel/downloads/katuali/build/lib/katuali/data/config.yaml) I do not see where seqkit should be installed, and in fact I noticed that it is not installed on my system.

Is this error indeed being caused by the absence of seqkit? if so, where should I install it? If not, what could be the cause of the problem?

Also, is there any plan to launch a conda package for katuali?

BCArg avatar Jul 17 '20 09:07 BCArg

Hi @BCArg,

seqkit should be installed as part of pomoxis. If you were able to start the basecall_guppy rule, you must have a functional pomoxis venv, as the venv/bin/activation script is a required input to the basecall_guppy rule. Check if seqkit installed ok, otherwise it can be obtained here - you should copy the executable into your pomoxis virtual env bin directory.

There are currently no plans to launch a conda package for katuali. We may at some point look at using Snakemake's conda integration, which would allow snakemake to automatically create rule-specific conda environments using environment files that could be distributed with katuali.

We'd happily accept a pull request if you'd like to look at this.

mwykes avatar Jul 17 '20 10:07 mwykes

thanks for the quick reply.

I did manually install seqkit and copied the executable to /usr/local/bin, according to the instructions

Also following your suggestion I copied seqkit to /home/bhinckel/git/pomoxis/venv/bin/, as shown below:

(katuali) bhinckel@Brightcore-testsrv:~/downloads/katuali$ ll /home/bhinckel/git/pomoxis/venv/bin/
total 15236
drwxr-xr-x 3 bhinckel 17930     4096 Jul 16 15:09 ../
drwxr-xr-x 3 bhinckel 17930     4096 Jul 16 15:15 activate/
drwxr-xr-x 3 bhinckel 17930     4096 Jul 17 12:49 ./
-rwxr-xr-x 1 bhinckel 17930 15589376 Jul 17 12:49 seqkit*
(katuali) bhinckel@Brightcore-testsrv:~/downloads/katuali$ which seqkit
/usr/local/bin/seqkit
(katuali) bhinckel@Brightcore-testsrv:~/downloads/katuali$ 

Though I am still getting the same error. I am now sure that the error lies at the seqkit step, as all the previous command ran well, as show in guppy log file (below). I have also restarted the shell, though it did not help

GPU status before
Brightcore-testsrv   Fri Jul 17 12:50:17 2020  390.138
[0] Tesla C2050      | 57'C,   0 % |   373 /  2615 MB | gdm(65M) gdm(65M) colsen(16M) colsen(118M)
[12:50:35 - pick_gpu] SGE_HGR_gpu was not set, setting GPU to 0 based on memory and utilization
Runnning on host Brightcore-testsrv GPU 0
ONT Guppy basecalling software version 4.0.11+f1071ceb, client-server API version 2.1.0
config file:        /opt/ont/guppy/data/dna_r9.4.1_450bps_hac.cfg
model file:         /opt/ont/guppy/data/template_r9.4.1_450bps_hac.jsn
input path:         MinIonRun1/reads
save path:          MinIonRun1/guppy
chunk size:         2000
chunks per runner:  512
records per file:   4000
num basecallers:    1
cpu mode:           ON
threads per caller: 4

Found 162 fast5 files to process.
Init time: 4591 ms

0%   10   20   30   40   50   60   70   80   90   100%
|----|----|----|----|----|----|----|----|----|----|
***************************************************
Caller time: 1003176 ms, Samples called: 14237088, samples/s: 14192
Finishing up any open output files.
Basecalling completed successfully.
gpustat after
Brightcore-testsrv   Fri Jul 17 13:07:25 2020  390.138
[0] Tesla C2050      | 58'C,   0 % |   372 /  2615 MB | gdm(65M) gdm(65M) colsen(16M) colsen(118M)
Combining the following fastq files into MinIonRun1/guppy/basecalls.fasta
MinIonRun1/guppy/fastq_runid_f1d7aa40eb01e7882b06a486c721890952f0f34a_0_0.fastq

BCArg avatar Jul 17 '20 11:07 BCArg

I just called seqkit fq2fa MinIonRun1/guppy/*.fastq from the command line and it works.

I have also put the directories under ~/git/pomoxis/venv/bin/activate/pomoxis-0.3.4-py_0 directly under ~/git/pomoxis/venv/bin/activate/ i.e.

(katuali) bhinckel@Brightcore-testsrv:~$ ll /home/bhinckel/git/pomoxis/venv/bin/activate/
total 20
drwxr-xr-x 3 bhinckel 17930 4096 Jul 17 12:49 ../
drwxr-xr-x 2 bhinckel 17930 4096 Jul 17 14:56 python-scripts/
drwxr-xr-x 5 bhinckel 17930 4096 Jul 17 14:56 info/
drwxr-xr-x 4 bhinckel 17930 4096 Jul 17 14:56 site-packages/
drwxr-xr-x 5 bhinckel 17930 4096 Jul 17 14:57 ./

Though this had no effect on the error.

Please also note that the github and the documentation are inconsistent. The former says that one needs scrappie, pomoxis, medaka, and nanopolish whereas the later says that one needs suppy (probably a typo for guppy), pomoxis, canu, flye and medaka for successfully running the tests

BCArg avatar Jul 17 '20 13:07 BCArg

Just wondering if any progress was made to figure out this error, as I am having the exact same issues and I am not really sure what can be done at this point.

kazar4 avatar May 16 '21 17:05 kazar4