pharokka icon indicating copy to clipboard operation
pharokka copied to clipboard

Pharokka stuck at running mmseqs search

Open quocviet0908 opened this issue 1 year ago • 7 comments

  • pharokka version: 1.6.0
  • Python version: 3.10.8
  • Operating System: Ubuntu 20.04

Hello,

Pharokka is very promising for annotating viral genomes. However, when I tried to run it with my data the program always stuck at this step:

"2024-01-19 15:15:21.280 | INFO | external_tools:run:50 - Started running mmseqs search -e 1E-05 /home/quocviet/Work/Extra/pharokka_v1.4.0_databases/phrogs_profile_db pharokka_20240119/target_dir/target_seqs pharokka_20240119/mmseqs/results_mmseqs pharokka_20240119/tmp_dir/ -s 8.5 --threads 1 ..."

The installation seems fine without showing any obvious error (I installed pharokka via mamba).

The command I used: time pharokka.py -i mydata.fasta -o pharokka_20240119 -d /home/quocviet/Work/Extra/pharokka_v1.4.0_databases -f

Could you please help me with this? Thank you.

quocviet0908 avatar Jan 19 '24 08:01 quocviet0908

Hi @quocviet0908 ,

What version of mmseqs2 is installed? v13.45111? If not, that would be the cause of this issue. Please install it with:

mamba install mmseqs2==13.45111

If yes, then maybe you should give pharokka more threads (not 1) - e.g. -t 16 or -t 8.

If you are annotating one or a only a few phages, try --fast as well.

George

gbouras13 avatar Jan 19 '24 08:01 gbouras13

Hi @quocviet0908 ,

What version of mmseqs2 is installed? v13.45111? If not, that would be the cause of this issue. Please install it with:

mamba install mmseqs2==13.45111

If yes, then maybe you should give pharokka more threads (not 1) - e.g. -t 16 or -t 8.

If you are annotating one or a only a few phages, try --fast as well.

George

Hi George.

Thank you for your quick response. The version of mmseqs2 is v13.45111.

I tried to use --fast option to bypass MMseqs2 and the program ran smoothly, but I will not be able to get CARD or VFDB annotations.

I think the problem is somehow related to mmseqs2 but I'm not sure though.

quocviet0908 avatar Jan 19 '24 08:01 quocviet0908

Maybe upload the log file and I will try and see what the issue is.

George

gbouras13 avatar Jan 19 '24 08:01 gbouras13

Maybe upload the log file and I will try and see what the issue is.

George

Hi George,

Please see my attachment. This is the log when the command is stuck at that step. logs.zip

Thank you very much.

quocviet0908 avatar Jan 19 '24 08:01 quocviet0908

I'm seeing the same (i think). exits after calling mmseqs

2024-02-07 11:36:24.996 | INFO     | external_tools:run:50 - Started running mmseqs search -e 1E-05 /db/pharokka/phrogs_profile_db pharokka_terL/target_dir/target_seqs pharokka_terL/mmseqs/results_mmseqs pharokka_terL/tmp_dir/ -s 8.5 --threads 24 ...
2024-02-07 11:36:25.041 | ERROR    | external_tools:run_tool:94 - Error calling mmseqs search -e 1E-05 /db/pharokka/phrogs_profile_db pharokka_terL/target_dir/target_seqs pharokka_terL/mmseqs/results_mmseqs pharokka_terL/tmp_dir/ -s 8.5 --threads 24 (return code 1)

V 1.6.1; up to date db; mmseqs2=13.45111 Runs fine with --fast flag

Edit/

from mmseqs_search_XXXX.err

Could not create symlink of pharokka_terL/tmp_dir//6144855635082578743!
Command line: mmseqs search -e 1E-05 /db/pharokka/phrogs_profile_db pharokka_terL/target_dir/target_seqs pharokka_terL/mmseqs/results_mmseqs pharokka_terL/tmp_dir/ -s 8.5 --threads 24

Thanks

iaindhay avatar Feb 06 '24 22:02 iaindhay

Hi @iaindhay ,

Interesting - looks potentially like a permissions issue on the system you are running or an issue with space (looking at the MMSeqs2 issues e.g. https://github.com/soedinglab/MMseqs2/issues/171 )

George

gbouras13 avatar Feb 06 '24 23:02 gbouras13

Thanks George. Yes i just realized it an issue with creating symbolic links in that drive. Issue on my end.

I have one error like #300 where i see the annotation stop during the post processing steps with the ValueError: Columns must be same length as key. strangely this is only with one of my genomes. Only it looks to be during the PHROGs post processing not the VFDB as in #300

2024-02-07 12:20:17.410 | INFO     | external_tools:run:52 - Done running mmseqs search --min-seq-id 0.8 -c 0.4 /db/pharokka/vfdb /1TB/phage/ar1/VFDB_target_dir/target_seqs /1TB/phage/ar1/VFDB/results_mmseqs /1TB/phage/ar1/VFDB_dir/ -s 8.5 --threads 24
2024-02-07 12:20:17.411 | INFO     | external_tools:run:50 - Started running mmseqs createtsv /db/pharokka/vfdb /1TB/phage/ar1/VFDB_target_dir/target_seqs /1TB/phage/ar1/VFDB/results_mmseqs /1TB/phage/ar1/vfdb_results.tsv --full-header --threads 24 ...
2024-02-07 12:20:17.440 | INFO     | external_tools:run:52 - Done running mmseqs createtsv /db/pharokka/vfdb /1TB/phage/ar1/VFDB_target_dir/target_seqs /1TB/phage/ar1/VFDB/results_mmseqs /1TB/phage/ar1/vfdb_results.tsv --full-header --threads 24
2024-02-07 12:20:17.440 | INFO     | __main__:main:363 - Running PyHMMER on PHROGs.
2024-02-07 12:20:25.627 | INFO     | __main__:main:379 - Post Processing Output.
2024-02-07 12:20:25.649 | INFO     | post_processing:create_mmseqs_tophits:2104 - Processing MMseqs2 outputs.
2024-02-07 12:20:25.650 | INFO     | post_processing:create_mmseqs_tophits:2105 - Processing PHROGs output.
Traceback (most recent call last):
  File "/miniconda3/envs/pharokka/bin/pharokka.py", line 499, in <module>
    main()
  File "/miniconda3/envs/pharokka/bin/pharokka.py", line 418, in main
    pharok.process_results()
  File "/miniconda3/envs/pharokka/bin/post_processing.py", line 242, in process_results
    merged_df[["mmseqs_phrog", "mmseqs_top_hit"]] = merged_df[
  File "/miniconda3/envs/pharokka/lib/python3.10/site-packages/pandas/core/frame.py", line 4287, in __setitem__
    self._setitem_array(key, value)
  File "/miniconda3/envs/pharokka/lib/python3.10/site-packages/pandas/core/frame.py", line 4329, in _setitem_array
    check_key_length(self.columns, key, value)
  File "/miniconda3/envs/pharokka/lib/python3.10/site-packages/pandas/core/indexers/utils.py", line 390, in check_key_length
    raise ValueError("Columns must be same length as key")
ValueError: Columns must be same length as key

iaindhay avatar Feb 07 '24 00:02 iaindhay