C3POa icon indicating copy to clipboard operation
C3POa copied to clipboard

Issues with C3POa.py: cannot execute gonk

Open ckim11 opened this issue 5 years ago • 15 comments

I'm having issues with the C3POa.py script at the gonk stage. Preprocessing worked out well, but as I run C3POa.py on one of my fastq files, I am getting this error from bash:

Traceback (most recent call last): File "/Users/ckim/C3POa/C3POa.py", line 663, in main() File "/Users/ckim/C3POa/C3POa.py", line 656, in main analyze_reads(read_list) File "/Users/ckim/C3POa/C3POa.py", line 624, in analyze_reads scoreList = split_SW(name, seed, seq) File "/Users/ckim/C3POa/C3POa.py", line 419, in split_SW scoreList = runGonk(seq1, seq) File "/Users/ckim/C3POa/C3POa.py", line 388, in runGonk scoreList = parse_file(scores) File "/Users/ckim/C3POa/C3POa.py", line 373, in parse_file for line in open(scores): FileNotFoundError: [Errno 2] No such file or directory: '/Users/ckim/20200213_0143_20200212_ASD_mCh_R2c2test///SW_PARSE.txt'

When I check the gonk_messages to see the error, this is the error reported:

sh: /Users/ckim/C3POa/gonk/gonk: cannot execute binary file

I made sure to get the Go dependency for gonk. I did setup from the instruction at the beginning, but haven't been able to get any farther with the script. Any help is appreciated and happy to give any more information as needed. Thanks!

ckim11 avatar Feb 14 '20 04:02 ckim11

If you try to execute the gonk binary outside of the script, do you get a message telling you to enter sequences?

rvolden avatar Feb 14 '20 04:02 rvolden

I do not, I still get the "cannot execute binary file" error

ckim11 avatar Feb 14 '20 04:02 ckim11

Try deleting the binary and building manually using go build src/gonk from the base directory

rvolden avatar Feb 14 '20 04:02 rvolden

Getting somewhere! That seemed to work out with the gonk issue at least. Now I'm getting a different gonk_messages, but the same bash error as above.

panic: open /Users/ckim/20200213_0143_20200212_ASD_mCh_R2c2test/consensus/: is a directory

goroutine 1 [running]: main.check(...) /Users/ckim/C3POa/gonk/src/gonk.go:32 main.writeScores(0xc00032ef00, 0x404, 0x404) /Users/ckim/C3POa/gonk/src/gonk.go:147 +0x27a main.main() /Users/ckim/C3POa/gonk/src/gonk.go:180 +0x249

The panic makes me think that I'm not assigning the -p flag properly. Not sure if that's the reason for the errors here or something else with any of the inputs going in.

ckim11 avatar Feb 14 '20 06:02 ckim11

This makes me think that it isn't adding on the filename for the output file correctly. gonk should have been given the path from your C3POa command line arguments (--path or -p). Can I see the C3POa command/bash script that you're using to run this?

rvolden avatar Feb 14 '20 06:02 rvolden

In any case, I've updated the gonk source code so that if it's given a directory with no filename (what seems to be happening here), it will automatically add the default output filename ("SW_PARSE.txt"). Please run make clean in your gonk directory and git pull. Then rebuild using make. The reason the original binary didn't work for you is because I made the mistake of adding the binary compiled on my computer to the repository. This should hopefully fix the issue.

rvolden avatar Feb 14 '20 20:02 rvolden

Hi,

I'm having the same issue. I've tried your step of updating gonk.

Attempt with no -o set

python3 C3POa.py -r /Users/tim/C3POa_preprocessing/1/Splint/R2C2_raw_reads.fastq -p ~/ -m NUC.4.4.mat -l 1000 -d 500 -c example_config -t -f /Users/tim/C3POa_preprocessing/1/Splint/R2C2_raw_reads.fastq gonk took 0.05568385124206543 seconds to run. Traceback (most recent call last): File "C3POa.py", line 663, in main() File "C3POa.py", line 656, in main analyze_reads(read_list) File "C3POa.py", line 624, in analyze_reads scoreList = split_SW(name, seed, seq) File "C3POa.py", line 419, in split_SW scoreList = runGonk(seq1, seq) File "C3POa.py", line 388, in runGonk scoreList = parse_file(scores) File "C3POa.py", line 373, in parse_file for line in open(scores): FileNotFoundError: [Errno 2] No such file or directory: '/Users/tim///SW_PARSE.txt'

gonk_messages: panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x28 pc=0x10ae785]

goroutine 1 [running]: main.writeScores(0xc00105a000, 0x7ac, 0x7ac) /Users/tim/gonk/src/gonk.go:147 +0x55 main.main() /Users/tim/gonk/src/gonk.go:184 +0x249

tiwz avatar Feb 18 '20 16:02 tiwz

I've pushed an update to gonk that fixes the error you've been getting. You should be able to give it a directory or a filename. As for how you're running C3POa.py, I would put this into a wrapper bash script (I do this for consistency and parallelization). The below example is using GNU Parallel.

#!/bin/bash

config=$HOME'/config'
c3poa=$HOME'/C3POa/C3POa.py'
matrix=$HOME'/C3POa/NUC.4.4.mat'

# goes to preprocessing directory
path=/path/to/preprocessed/folders/

# what to name the consensus files
cons_out=example_consensus.fasta

# error file
err=$path/err

# number of threads
jobs=32

parallel -j$jobs python3 $c3poa --reads {0}/R2C2_raw_reads.fastq \
                                --path {0} \
                                --matrix $matrix \
                                --config $config \
                                --output {0}/$cons_out \
                                2>$err ::: $path/*/*

The asterisks at the end will depend on if you have multiple splints that you demultiplexed with. So if in each numbered folder you have another folder called Splint or the like, you would need both asterisks. If you have a file called R2C2_raw_reads.fastq in the numbered folders, change $path/*/* to $path/*. You may need to replace $path with the actual path.

rvolden avatar Feb 18 '20 19:02 rvolden

Hi,

I would like to use C3PO for my data analysis. In my case preprocessing goes OK. However, when I try to run C3POa I get the following error:

python3 C3POa.py -t -r /home/oscar/Desktop/C3POa/splint1/preprocessed_reads.fastq -p /home/oscar/Desktop/C3POa/splint1/temp -m /home/oscar/Desktop/C3POa/NUC.4.4.mat -l 1000 -d 500 -c configf.txt Using gonk from your path, not the config file. /home/oscar/Desktop/C3POa/splint1/preprocessed_reads.fastq gonk took 0.0009248256683349609 seconds to run. Traceback (most recent call last): File "C3POa.py", line 708, in main() File "C3POa.py", line 703, in main sys.stderr.write("Consensus reads: {0}\t({1:.2f}%)\n".format(good[0], good[0]/total*100)) ZeroDivisionError: division by zero

I have also excluded the -o. Does this error means that no consensus sequences were identified?

OscarT32 avatar May 12 '20 18:05 OscarT32

This is a debugging statement that gets run at the very end of the script. Your consensus sequences should be intact. To squelch the message, update C3POa by pulling from the repo.

rvolden avatar May 12 '20 18:05 rvolden

After the run, the consensus and subreads files are created but are empty. Additioanlly I get three extra files: seq1.fasta, seq2.fasta and gonk_messages. The two seq files seems to be partial sequences and the gonk_messages files says: sh 1: gonk: not found (I am new to bioinformatics, I apologize if its something simple).

OscarT32 avatar May 17 '20 15:05 OscarT32

Can you post your config file? Also did you build gonk? This is almost certainly a path issue since everything else is working but gonk isn't running.

rvolden avatar May 17 '20 17:05 rvolden

Thanks. I have checked the config file, and there was a small mistake in the path. The run is going OK now.

OscarT32 avatar May 19 '20 17:05 OscarT32

Hi, I am also struggling with the C3POa.py script. I was able to run the preprocessing script without an issue but now I have troubles with executing gonk. The gonk_messages file contains sh: /gonk/gonk: No such file or directory. I'm running the scripts on a mac.

rakszewska avatar Jun 22 '20 22:06 rakszewska

There's a good chance you didn't build the package, or it has the wrong path

rvolden avatar Jun 22 '20 23:06 rvolden