ete
ete copied to clipboard
Unknown error at alignment stage
I am trying to run ete3 to build a species tree. Many of the COGs that I'm using have multiple sequences from the same species. I hope that this is not a problem for ete3.
I have run ete3 with the following command -
xvfb-run ete3 build --cpu 4 -w standard_trimmed_fasttree -m sptree_fasttree_all -o species_tree/output/ --clearall -a species_tree/og_proteomes.fa --cogs og_genes.txt
However, the run is interrupted with the following error message:
ERR - Thread cog_all-alg_concat_default-fasttree_full contains errors:
ERR - ** CogSelectorTask (508 species, MCL-COGs, /cog_all-al...ttree_full)
ERR - -> 2cee1f0f018df8cd453eabef2d829a03
ERR - -> zero-size array to reduction operation minimum which has no identity
ERR - Done with ERRORS
Can you please help me with this?
I have attached the traceback of the error. ete3_traceback.txt
The error was indeed due to the format of my COGs file. Only one sequence per organism is permitted. Changing the format to this aspect solved the issue, although the run failed later at the alignment stage with the following traceback -
ERR - Job error reported: Job (clustalo---threads-1, a4670a)
ERR - Errors found in ConcatAlgTask (508 species, 154732 COGs, ConcatAlg, /cog_all-al...ttree_full)
Traceback (most recent call last):
File "/home/.conda/envs/ete3_new/lib/python3.6/site-packages/ete3-3.1.2-py3.7.egg/ete3/tools/ete_build_lib/scheduler.py", line 257, in schedule
task.status = task.get_status(qstat_jobs)
File "/home/.conda/envs/ete3_new/lib/python3.6/site-packages/ete3-3.1.2-py3.7.egg/ete3/tools/ete_build_lib/master_task.py", line 198, in get_status
self.job_status = self.get_jobs_status(sge_jobs)
File "/home/.conda/envs/ete3_new/lib/python3.6/site-packages/ete3-3.1.2-py3.7.egg/ete3/tools/ete_build_lib/master_task.py", line 285, in get_jobs_status
st = j.get_status(sge_jobs)
File "/home/.conda/envs/ete3_new/lib/python3.6/site-packages/ete3-3.1.2-py3.7.egg/ete3/tools/ete_build_lib/master_task.py", line 198, in get_status
self.job_status = self.get_jobs_status(sge_jobs)
File "/home/.conda/envs/ete3_new/lib/python3.6/site-packages/ete3-3.1.2-py3.7.egg/ete3/tools/ete_build_lib/master_task.py", line 306, in get_jobs_status
raise TaskError(j, "Job execution error %s" %errorpath)
ete3.tools.ete_build_lib.errors.TaskError: Job execution error /species_tree/output/tasks/a4670ada20f868759533477a3ccf6305
INFO - Waiting 2 seconds
ERR - Thread cog_all-alg_concat_default-fasttree_full contains errors:
ERR - ** ConcatAlgTask (508 species, 154732 COGs, ConcatAlg, /cog_all-al...ttree_full)
ERR - -> Job (clustalo---threads-1, a4670a)
ERR - -> /species_tree/output/tasks/a4670ada20f868759533477a3ccf6305
ERR - -> Job execution error /species_tree/output/tasks/a4670ada20f868759533477a3ccf6305
ERR - Done with ERRORS
Can you please help me with this instead?
Hi,
yes, the reason of this error is you selected cog_all
as cogselector, which means it all COGs from your file will be used. And if one of you COG file only represent just one sequence from your input seq, it complains this error because it only has one sequence and cannot conduct the following alignment.
I would reccommend maybe choose other cogselector and refine the COGs file. More information can be found, here http://etetoolkit.org/documentation/ete-build/
Cheers
Hi @jhcepas ,@dengzq1234, @maystrenk0, researchers that worked with ETE-TOOLS.
Through ETEToolkit v 3.1.2 I'm trying to construct a phylogenetic tree from the 40 conserved universal marker genes from proGenomes v2.1 database (proGenomes2.1_markerGenes.tar.gz, https://progenomes.embl.de/data/), replicating one article of your research group:https://www.nature.com/articles/s41396-020-0600-z.
I have created the multifasta file and the COG list as the indications indicated in : http://etetoolkit.org/cookbook/ete_build_supermatrix.ipynb
The software it seems to run but I obtained the same error as @gauravdiwan89. As the instructions said, each specie it has been represented by only one sequence. But sure I have some nomenclature error between cog file and multifasta that gives me the error. I read your reponse, and if I'm ok for each COG (each line) I have more than one sequences. For this I decided to write this issue because I think that its some error in preparing both files and I am not capable to find some toy example of this command.
I attach (due to their big size) a piece of the files in order to expose better the scenario.
1) Cog file: I have 40 lines (40 cogs) with the next form:
100053_COG0012 1000561_COG0012 1000565_COG0012 1000568_COG0012 1000569_COG0012
100053_COG0016 1000561_COG0016 1000562_COG0016 1000565_COG0016 1000568_COG0016
where the first number is TaxonID from NCBI.
2) and in the multifasta file exists one sequence for each one of the identifiers indicated above:
for example:
>100053_COG0012
MSLNCGIVGLPNVGKSTIFNALTKAGAQMENYPFCTIEPNKGIVEVPDSRLDRLAEIAKPQKVVPAIIEFVDIAGLVKGASQGEGLGNKFLSHIREVDAICHVVRAFEDENVTHVHGKINPVDDAAVVNMELIFADLDSADKQFQRISKNAKNGNKEAQEQTSVLEKILTLLKAGKPARLASLKDEEKKIARSFQLITLKPVMYVANIADKDAAKKDTPLLTQIKQMAKEENAELVILCGRFEEEISGLNRNEQLDFLKEIGETESGLDRMIKTAYKLLGLITFFTAGEMEVRAWTTPWNSTGPKAASVIHSDFEKAFIRAEVMSYEDLDRAGTQTKVKEEGKLRIEGKEYVVQDGDVVYFRINA
>1000561_COG0012
MGFNCGIVGLPNVGKSTLFNALTKSGIAAENFPFCTIEPNSGIVPMPDARLNALAEIVKPERVLPTTMEFVDIAGLVAGASKGEGLGNKFLANIRETDAIAHVVRCFEDDNVIHVSNSVDPKRDIEIIDLELIFADLDSCEKQLQKVARNAKGGDKEALAQKALLEKLIPHFTEGKPARSLLKNLGDEEKRLVRSFHLLTSKPVMYIANVAEDGFENNPHLDVVKAIAEEEGAVVVPVCNKIEAEIAELEDGEEKDMFLESLGLEEPGLNRVIRAGYGLLNLQTYFTAGVKEVRAWTVRVGATAPQAAGVIHTDFEKGFIRAEVVAYDDFIQFKGEQGAKEAGKWRLEGKDYIVKDGDVMHFRFNV
>1000565_COG0012
MSLKCGIVGLPNVGKSTLFNALTKAGIAAENYPFCTIEPNVGIVEVPDPRLAQLSEIVKPQKIQPAIVEFVDIAGLVAGASKGEGLGNQFLANIRETDAIVNVVRCFDDENVVHVNGRVDPIADIETIVTELALADLAAVERTIARDGKKAKSGDKDAQKLVAVLEKLLPHLNEGKPARTLGLSDDDKVIIKPLCLLTIKPAMYVGNVLEDGFENNPYLDRLREFAAKEGAPVVSVCAKIEAELADLEDEDKKAFLADLGLDEPGLNRLIRAGYDLLGLQTYFTAGVKEVRAWTIHKGDTAPQAAGVIHTDFERGFIRAQTIAFEDFIAYKGEQGAKEAGKMRAEGKEYVVRDGDVLNFLFNV
>1000568_COG0012
MSTNLEVGIVGLPNVGKSTLFNAITKAGAEAANYPFCTIEPNVGVVEVPDARLRTLTNMYHPKKTIPAVMRFVDIAGLVAGASKGEGLGNKFLSHIRETDAIAEVVRCFEDDNITHVSGSVDPLRDIDIINTELCLADLETVQRRVDRLAKIAQCGDKAAKAELAVLQKILTALEAGEPVRKVALQDEEKQTVKELNLLTIKPILYIANVAEDEAAQPDANPLVQKLTAFAAAEGAQVVAVSAKIEAEIAELPDDEAAAFLEELGLSESGLTKLIKAGYSLLGLINFFTAGADEVRAWTIVKGTKAQKAAGKIHTDIERGFIRAEIVSYTDLIACGGEQAAKEKGLVRLEGKEYLMQDGDVTYFRFNV
>1000569_COG0012
MSTNLEVGIVGLPNVGKSTLFNAITKAGAEAANYPFCTIEPNVGVVEVPDERIDVLTNMYHPKKTIPAVMRFVDIAGLVAGAASGEGLGNKFLSHIRETDAIAEVVRCFDDANITHVAGSVDPIRDIDIINTELCLADIEVAQRRLDRISKIATCGDKQAKAEASVLTVVLKTLEEGKPARTASLTEDDWQWVKELNLLTAKPIIYIANVAEEEAAHPEDNPYVQRLIEFATHEQAQVVAVSAKIEAEIAELSPEEGTSFLAELGLTESGLDRVIKASYTLLGLINFFTAGADEVRAWTIVNGTKAPKAAGKIHTDIERGFIRAEIVSYEDLIACGSEQAAKEKGLVRLEGKDYIMKDGDVTYFRFNV
>100053_COG0016
MNLSEELDSIYQEAIQKISSSISEEDLDRNKNDFIGKKGKLTAVLKNVASLSIEEKKTVGQKANELSKKLENFVSETKISLKKKFFENQAAFEFFDALRPLTSPSNGSLHPITQIQYEIEDIFASMGFSVMDGPEIETDINNFGALNFTEDHPAREMQDTFYLENGNLLRTHTSAIQVRTLRKLKPPFRIIAPGRVFRYEEVDASHEHTFYQIEGMVVGKDISAANLIDTMQVLLSRIFEKEIKTRLRPGYFPFVEPGFELDINCLVCEGKGCPVCKQSGWLELLPCGLIHPNVLSHAGLDPKEWTGFAFGLGLDRLVMMRYGIHDIRYFQSGNLRFLKQF
>1000561_COG0016
MENLDALVSQALEAVRHTEDVNALEQIRVHYLGKKGELTQVMKTLGDLPAEERPKVGALINVAKEKVQDALNARKTELEGAALAARLAAERIDVTLPGRGQLSGGLHPVTRTLERIEQCFSRIGYEVAEGPEVEDDYHNFEALNIPGHHPARAMHDTFYFNANMLLRTHTSPVQVRTMESQQPPIRIVCPGRVYRCDSDLTHSPMFHQVEGLLVDEGVSFADLKGTIEEFLRAFFEKQLEVRFRPSFFPFTEPSAEVDIQCVICSGNGCRVCKQTGWLEVMGCGMVHPNVLRMSNIDPEKFQGFAFGMGAERLAMLRYGVNDLRLFFDNDLRFLGQFR
>1000562_COG0016
MDLQTQLEDLKTKTLEHLKALTGNHGKELQELRVSVLGKKGSLTELLKGLKDLSNDLRPVVGKQVNEVRDVLTKAFDEQVKVVEAARIQAQLEAESIDVTLPGRQMTLGNRHILTQTSDEIEDIFLGMGFQIVDGFEVERDYYNFERMNLPKDHPARDMQDTFYITEDILLRTHTSPVQARTLDQHDFSKGPLKMISPGRVFRRDTDDATHSHQFHQIEGLVVGKNISMGDLKGTLEMIIKKMFGEERKIRLRPSYFPFTEPSVEVDVSCFKCGGKGCNVCKKTGWIEILGAGMVHPSVLEMSGVDSQEYSGFAFGLGQERVAMLRYGINDIRGFYQGDRRFAEQFN
>1000565_COG0016
MSDLEALVSQAESDFSAAADAASLEQAKARYLGKSGSLTEQLKGLGKLDPEARKEAGAAINVVKQKVEAALEARREALRRAALEARLAEEALDVTLPGRGQLRGGLHPVSRTLERIEQLFRGIGFDVADGPEIETDFHNFTALNTPENHPARSMHDTFYLEGASDVMLRTHTSPIQVRYMQAHVARHGGAEAMPEIRIIAPGRVYRVDSDATHSPMFHQVEGLWVGESVSFADLKGVVSDFLHRFFETDQLDVRFRPSFFPFTEPSAEIDVAFMSGPLAGRWLEIAGCGMVHPNVLGHCGIDAERYTGFAFGFGPDRLTMLRYGINDLRLFYDGDVRFLSQFS
>1000568_COG0016
MIQETIGAMQQAVQERLLHCRTAQDVQAVRVQYLGKKGELTALLKGMKNVPPAERPAFGQLVNAARSALEAKLQERQAEVEEQEMATRLQSETLDITLPSRQPVRGHMHPLHLTRRHMEEAFLRMGFSLVEGPEIETDYFNFQCLNFPPDHPARDMQDSMYLTDSLLLRTHTSPMQARVLQSHKPNEPVKVIVPGKVYRWDYDATHSPVFHQMEGLIVDRHIRFSDLKGMLEDFLREIFGASTKVRFRASYFPFTEPSAEVDISCVMCGGEGCRVCSHTGWLEILGCGMVHPNVLRLNGYDPEQVTGFAFGMGVERIAMLKYGIDDLRLFYENDMRFLTQF
The code used is:
ete3 build -w clustalo_default-trimal01-none-none -m cog_all-alg_concat_default-fasttree_default -o kaiju_sptree/ \
--clearall -a multifasta.fa --cogs coglist.txt
And the error is:
INFO - Launched 0 jobs. 1(R), 39(W). Cores usage: 1/1
INFO - Updating tasks status: (Tue May 10 12:11:02 2022)
INFO - Thread cog_all-alg_concat_default-fasttree_default: pending tasks: 1 of sizes: 36881
INFO - (R) ConcatAlgTask (36881 species, 40 COGs, ConcatAlg, /cog_all-al...ee_default)
INFO - Waiting 2 seconds
INFO - Updating tasks status: (Tue May 10 12:11:04 2022)
INFO - Thread cog_all-alg_concat_default-fasttree_default: pending tasks: 1 of sizes: 36881
INFO - (R) ConcatAlgTask (36881 species, 40 COGs, ConcatAlg, /cog_all-al...ee_default)
INFO - Waiting 2 seconds
INFO - Launched 0 jobs. 1(R), 39(W). Cores usage: 1/1
INFO - Updating tasks status: (Tue May 10 12:11:06 2022)
INFO - Thread cog_all-alg_concat_default-fasttree_default: pending tasks: 1 of sizes: 36881
INFO - (R) ConcatAlgTask (36881 species, 40 COGs, ConcatAlg, /cog_all-al...ee_default)
INFO - Waiting 2 seconds
INFO - Launched 0 jobs. 1(R), 39(W). Cores usage: 1/1
INFO - Updating tasks status: (Tue May 10 12:11:08 2022)
INFO - Thread cog_all-alg_concat_default-fasttree_default: pending tasks: 1 of sizes: 36881
INFO - (R) ConcatAlgTask (36881 species, 40 COGs, ConcatAlg, /cog_all-al...ee_default)
INFO - Waiting 2 seconds
INFO - Updating tasks status: (Tue May 10 12:11:10 2022)
INFO - Thread cog_all-alg_concat_default-fasttree_default: pending tasks: 1 of sizes: 36881
INFO - (R) ConcatAlgTask (36881 species, 40 COGs, ConcatAlg, /cog_all-al...ee_default)
INFO - Waiting 2 seconds
INFO - Launched 0 jobs. 1(R), 39(W). Cores usage: 1/1
INFO - Updating tasks status: (Tue May 10 12:11:12 2022)
INFO - Thread cog_all-alg_concat_default-fasttree_default: pending tasks: 1 of sizes: 36881
INFO - (R) ConcatAlgTask (36881 species, 40 COGs, ConcatAlg, /cog_all-al...ee_default)
INFO - Waiting 2 seconds
INFO - Launched 1 jobs. 1(R), 38(W). Cores usage: 1/1
INFO - Updating tasks status: (Tue May 10 12:11:14 2022)
INFO - Thread cog_all-alg_concat_default-fasttree_default: pending tasks: 1 of sizes: 36881
INFO - (R) ConcatAlgTask (36881 species, 40 COGs, ConcatAlg, /cog_all-al...ee_default)
ERR - Job error reported: Job (clustalo---threads-1, bd695f)
ERR - Errors found in ConcatAlgTask (36881 species, 40 COGs, ConcatAlg, /cog_all-al...ee_default)
Traceback (most recent call last):
File "/home/vant/miniconda3/envs/ete3/lib/python3.6/site-packages/ete3/tools/ete_build_lib/scheduler.py", line 257, in schedule
task.status = task.get_status(qstat_jobs)
File "/home/vant/miniconda3/envs/ete3/lib/python3.6/site-packages/ete3/tools/ete_build_lib/master_task.py", line 198, in get_status
self.job_status = self.get_jobs_status(sge_jobs)
File "/home/vant/miniconda3/envs/ete3/lib/python3.6/site-packages/ete3/tools/ete_build_lib/master_task.py", line 285, in get_jobs_status
st = j.get_status(sge_jobs)
File "/home/vant/miniconda3/envs/ete3/lib/python3.6/site-packages/ete3/tools/ete_build_lib/master_task.py", line 198, in get_status
self.job_status = self.get_jobs_status(sge_jobs)
File "/home/vant/miniconda3/envs/ete3/lib/python3.6/site-packages/ete3/tools/ete_build_lib/master_task.py", line 306, in get_jobs_status
raise TaskError(j, "Job execution error %s" %errorpath)
ete3.tools.ete_build_lib.errors.TaskError: Job execution error /media/vant/TextesD/ICO/ete_3/ete_almeu/kaiju_sptree/tasks/bd695fd72f11a94cb1b3066a1d4ec123
INFO - Waiting 2 seconds
ERR - Thread cog_all-alg_concat_default-fasttree_default contains errors:
ERR - ** ConcatAlgTask (36881 species, 40 COGs, ConcatAlg, /cog_all-al...ee_default)
ERR - -> Job (clustalo---threads-1, bd695f)
ERR - -> /media/vant/TextesD/ICO/ete_3/ete_almeu/kaiju_sptree/tasks/bd695fd72f11a94cb1b3066a1d4ec123
ERR - -> Job execution error /media/vant/TextesD/ICO/ete_3/ete_almeu/kaiju_sptree/tasks/bd695fd72f11a94cb1b3066a1d4ec123
ERR - Done with ERRORS
Data Error: Errors found in some tasks
Killing 1 running jobs...
Thanks on advance for your time and help,
Kind regards
Magii
Hi @magibc, I have tried the sample data that you showed, it works well except I have to correct the text format of coglist.txt. Please make sure each line of COG are TAB delimited, otherwise it may cause errors. Cheers, Ziqi
Thanks Ziqui, Yes I have an error in the copy paste here in the Github issue. With me coglist file and the mafft aligner go ahead but not with ClustalWO.
Thanks on advance,
David.