pharokka
pharokka copied to clipboard
Pharokka stuck at running mmseqs search
- pharokka version: 1.6.0
- Python version: 3.10.8
- Operating System: Ubuntu 20.04
Hello,
Pharokka is very promising for annotating viral genomes. However, when I tried to run it with my data the program always stuck at this step:
"2024-01-19 15:15:21.280 | INFO | external_tools:run:50 - Started running mmseqs search -e 1E-05 /home/quocviet/Work/Extra/pharokka_v1.4.0_databases/phrogs_profile_db pharokka_20240119/target_dir/target_seqs pharokka_20240119/mmseqs/results_mmseqs pharokka_20240119/tmp_dir/ -s 8.5 --threads 1 ..."
The installation seems fine without showing any obvious error (I installed pharokka via mamba).
The command I used: time pharokka.py -i mydata.fasta -o pharokka_20240119 -d /home/quocviet/Work/Extra/pharokka_v1.4.0_databases -f
Could you please help me with this? Thank you.
Hi @quocviet0908 ,
What version of mmseqs2 is installed? v13.45111? If not, that would be the cause of this issue. Please install it with:
mamba install mmseqs2==13.45111
If yes, then maybe you should give pharokka more threads (not 1) - e.g. -t 16
or -t 8
.
If you are annotating one or a only a few phages, try --fast
as well.
George
Hi @quocviet0908 ,
What version of mmseqs2 is installed? v13.45111? If not, that would be the cause of this issue. Please install it with:
mamba install mmseqs2==13.45111
If yes, then maybe you should give pharokka more threads (not 1) - e.g.
-t 16
or-t 8
.If you are annotating one or a only a few phages, try
--fast
as well.George
Hi George.
Thank you for your quick response. The version of mmseqs2 is v13.45111.
I tried to use --fast option to bypass MMseqs2 and the program ran smoothly, but I will not be able to get CARD or VFDB annotations.
I think the problem is somehow related to mmseqs2 but I'm not sure though.
Maybe upload the log file and I will try and see what the issue is.
George
Maybe upload the log file and I will try and see what the issue is.
George
Hi George,
Please see my attachment. This is the log when the command is stuck at that step. logs.zip
Thank you very much.
I'm seeing the same (i think). exits after calling mmseqs
2024-02-07 11:36:24.996 | INFO | external_tools:run:50 - Started running mmseqs search -e 1E-05 /db/pharokka/phrogs_profile_db pharokka_terL/target_dir/target_seqs pharokka_terL/mmseqs/results_mmseqs pharokka_terL/tmp_dir/ -s 8.5 --threads 24 ...
2024-02-07 11:36:25.041 | ERROR | external_tools:run_tool:94 - Error calling mmseqs search -e 1E-05 /db/pharokka/phrogs_profile_db pharokka_terL/target_dir/target_seqs pharokka_terL/mmseqs/results_mmseqs pharokka_terL/tmp_dir/ -s 8.5 --threads 24 (return code 1)
V 1.6.1; up to date db; mmseqs2=13.45111 Runs fine with --fast flag
Edit/
from mmseqs_search_XXXX.err
Could not create symlink of pharokka_terL/tmp_dir//6144855635082578743!
Command line: mmseqs search -e 1E-05 /db/pharokka/phrogs_profile_db pharokka_terL/target_dir/target_seqs pharokka_terL/mmseqs/results_mmseqs pharokka_terL/tmp_dir/ -s 8.5 --threads 24
Thanks
Hi @iaindhay ,
Interesting - looks potentially like a permissions issue on the system you are running or an issue with space (looking at the MMSeqs2 issues e.g. https://github.com/soedinglab/MMseqs2/issues/171 )
George
Thanks George. Yes i just realized it an issue with creating symbolic links in that drive. Issue on my end.
I have one error like #300 where i see the annotation stop during the post processing steps with the ValueError: Columns must be same length as key
. strangely this is only with one of my genomes. Only it looks to be during the PHROGs post processing not the VFDB as in #300
2024-02-07 12:20:17.410 | INFO | external_tools:run:52 - Done running mmseqs search --min-seq-id 0.8 -c 0.4 /db/pharokka/vfdb /1TB/phage/ar1/VFDB_target_dir/target_seqs /1TB/phage/ar1/VFDB/results_mmseqs /1TB/phage/ar1/VFDB_dir/ -s 8.5 --threads 24
2024-02-07 12:20:17.411 | INFO | external_tools:run:50 - Started running mmseqs createtsv /db/pharokka/vfdb /1TB/phage/ar1/VFDB_target_dir/target_seqs /1TB/phage/ar1/VFDB/results_mmseqs /1TB/phage/ar1/vfdb_results.tsv --full-header --threads 24 ...
2024-02-07 12:20:17.440 | INFO | external_tools:run:52 - Done running mmseqs createtsv /db/pharokka/vfdb /1TB/phage/ar1/VFDB_target_dir/target_seqs /1TB/phage/ar1/VFDB/results_mmseqs /1TB/phage/ar1/vfdb_results.tsv --full-header --threads 24
2024-02-07 12:20:17.440 | INFO | __main__:main:363 - Running PyHMMER on PHROGs.
2024-02-07 12:20:25.627 | INFO | __main__:main:379 - Post Processing Output.
2024-02-07 12:20:25.649 | INFO | post_processing:create_mmseqs_tophits:2104 - Processing MMseqs2 outputs.
2024-02-07 12:20:25.650 | INFO | post_processing:create_mmseqs_tophits:2105 - Processing PHROGs output.
Traceback (most recent call last):
File "/miniconda3/envs/pharokka/bin/pharokka.py", line 499, in <module>
main()
File "/miniconda3/envs/pharokka/bin/pharokka.py", line 418, in main
pharok.process_results()
File "/miniconda3/envs/pharokka/bin/post_processing.py", line 242, in process_results
merged_df[["mmseqs_phrog", "mmseqs_top_hit"]] = merged_df[
File "/miniconda3/envs/pharokka/lib/python3.10/site-packages/pandas/core/frame.py", line 4287, in __setitem__
self._setitem_array(key, value)
File "/miniconda3/envs/pharokka/lib/python3.10/site-packages/pandas/core/frame.py", line 4329, in _setitem_array
check_key_length(self.columns, key, value)
File "/miniconda3/envs/pharokka/lib/python3.10/site-packages/pandas/core/indexers/utils.py", line 390, in check_key_length
raise ValueError("Columns must be same length as key")
ValueError: Columns must be same length as key