phylophlan icon indicating copy to clipboard operation
phylophlan copied to clipboard

cannot run PhyloPhlAn with supermatrix_nt.cfg

Open zhangws119 opened this issue 3 years ago • 1 comments

PhyloPhlAn version 3.0.60 (27 November 2020)

Command line: /home/geobig/User/Softwares/Anaconda3/envs/phylophlan/bin/phylophlan -i 0_selected_genomes/ -d /home/geobig/Users/Liuli/phylophlan_databases/ -f phylophlan/supermatrix_nt.cfg -o 2_phylophlan --nproc 20 --diversity low --accurate --verbose

Automatically setting "database=phylophlan_databases" and "databases_folder=/home/geobig/Users/Liuli" Automatically setting "input=0_selected_genomes" and "input_folder=/home/geobig/Users/Wensi/nitrospirae_tree" "low-accurate" preset Arguments: {'input': '0_selected_genomes', 'clean': None, 'output': '2_phylophlan', 'database': 'phylophlan_databases', 'db_type': None, 'config_file': 'phylophlan/supermatrix_nt.cfg', 'diversity': 'low', 'accurate': True, 'fast': False, 'clean_all': False, 'database_list': False, 'submat': 'pfasum60', 'submat_list': False, 'submod_list': False, 'nproc': 20, 'min_num_proteins': 1, 'min_len_protein': 50, 'min_num_markers': 1, 'trim': 'not_variant', 'gap_perc_threshold': 0.67, 'not_variant_threshold': 0.99, 'subsample': None, 'unknown_fraction': 0.3, 'scoring_function': None, 'sort': False, 'remove_fragmentary_entries': False, 'fragmentary_threshold': 0.85, 'min_num_entries': 4, 'maas': None, 'remove_only_gaps_entries': False, 'mutation_rates': False, 'force_nucleotides': False, 'input_folder': '/home/geobig/Users/Wensi/nitrospirae_tree/0_selected_genomes', 'data_folder': '2_phylophlan/tmp', 'databases_folder': '/home/geobig/Users/Liuli', 'submat_folder': '/home/geobig/User/Softwares/Anaconda3/envs/phylophlan/lib/python3.7/site-packages/phylophlan/phylophlan_substitution_matrices/', 'submod_folder': '/home/geobig/User/Softwares/Anaconda3/envs/phylophlan/lib/python3.7/site-packages/phylophlan/phylophlan_substitution_models/', 'configs_folder': '/home/geobig/User/Softwares/Anaconda3/envs/phylophlan/lib/python3.7/site-packages/phylophlan/phylophlan_configs/', 'output_folder': '', 'genome_extension': '.fna', 'proteome_extension': '.faa', 'update': False, 'verbose': True} Loading configuration file "phylophlan/supermatrix_nt.cfg" Checking configuration file Checking "/home/geobig/User/Softwares/Anaconda3/envs/phylophlan/bin/makeblastdb" Checking "/home/geobig/User/Softwares/Anaconda3/envs/phylophlan/bin/blastn" Checking "/home/geobig/User/Softwares/Anaconda3/envs/phylophlan/bin/mafft" Checking "/home/geobig/User/Softwares/Anaconda3/envs/phylophlan/bin/trimal" Checking "/home/geobig/User/Softwares/Anaconda3/envs/phylophlan/bin/FastTreeMP" Checking "/home/geobig/User/Softwares/Anaconda3/envs/phylophlan/bin/raxmlHPC-PTHREADS-SSE3" Traceback (most recent call last): File "/home/geobig/User/Softwares/Anaconda3/envs/phylophlan/bin/phylophlan", line 10, in sys.exit(phylophlan_main()) File "/home/geobig/User/Softwares/Anaconda3/envs/phylophlan/lib/python3.7/site-packages/phylophlan/phylophlan.py", line 3227, in phylophlan_main verbose=args.verbose) File "/home/geobig/User/Softwares/Anaconda3/envs/phylophlan/lib/python3.7/site-packages/phylophlan/phylophlan.py", line 818, in init_database for f in glob.iglob(os.path.join(folder, '*')) File "/home/geobig/User/Softwares/Anaconda3/envs/phylophlan/lib/python3.7/site-packages/phylophlan/phylophlan.py", line 819, in for _, seq in SimpleFastaParser(bz2.open(f, 'rt') if f.endswith('.bz2') else open(f))]) File "/home/geobig/User/Softwares/Anaconda3/envs/phylophlan/lib/python3.7/site-packages/Bio/SeqIO/FastaIO.py", line 47, in SimpleFastaParser for line in handle: File "/home/geobig/User/Softwares/Anaconda3/envs/phylophlan/lib/python3.7/codecs.py", line 322, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xba in position 1035: invalid start byte

Dear developers,

I have a problem like that when I run the command. Could you help me solve the problem?

Thanks very much.

zhangws119 avatar May 25 '21 03:05 zhangws119

Hi, and thanks for reporting this. I believe the issue here is with the database parameter:

-d /home/geobig/Users/Liuli/phylophlan_databases/

The -d param should take the db name and not the path to the database(s) folder. If you want to use the phylophlan database you should specify:

-d phylophlan

and you don't need to provide the path as that should be automatically detected (I'm assuming that's the default database location). In case not, you can either specify:

-d /home/geobig/Users/Liuli/phylophlan_databases/phylophlan/

or

-d phylophlan --databases_folder /home/geobig/Users/Liuli/phylophlan_databases/

Now, having fixed the database parameter, I noticed you specified:

-f phylophlan/supermatrix_nt.cfg

and this configuration file is for a gene database (nucleotides). If you indeed wanted to use the phylophlan database, that's a collection of 400 universal proteins, so you should use the supermatrix_aa.cfg instead.

Please, let me know if this fixes your problem.

Many thanks, Francesco

fasnicar avatar May 25 '21 09:05 fasnicar