ufcg icon indicating copy to clipboard operation
ufcg copied to clipboard

ERROR! Tree file of gene COX2 not found : trees2/aligned_COX2_pro.zZ.fasta.treefile

Open salvatierra8 opened this issue 1 year ago • 15 comments

Greetings,

I getting said error when running the tree command, it also seems the process does not complete because of this. I'm not able to determine what is causing the error. I checked the COX2_pro.zZ.fasta files:

zZ7200270472082568702zZ MYFQDSATPNQEEDGQLRLLDTDTSIVAPVDTHIRFIVSAADVIHDFAIPSLGIKIDACPGRLNQVSALIEREGVFYGQCSELCGVAHSAMPIKLEVVSLPEFLE

zZ5232740519337231783zZ MGRESLVSPRRSRAASARRLLPGLSRRVLTSLLLFSRRRSYGDLSGVSPEICNNGLGCGSPLDTSVAPEGMLGVSRPPALVDSPTSSDDPPSVLPAQNISATHFYVGSNVVRNYGIFLQARNIPGQHFAVHTWSHQYTTSLTNEQVVAEIGWTMQILADNNFGLIPAHWRPRELVSSRFFTSHDSDSSPLPSPRPAYGDVDNRVRAIAREVFGLKTITWNPDEYEARVRGSKSPGLIPLEHVTSVEAVDGELIYVLTPPKYLLLTLYPFVNQSSSRPTLSSSPKAGTSPTSFVASPLAFLPFVLLFLADAKFFLSVFSFSLISGQKPGTRTPTPATSRRVSLQESPSSRLASPMAPLLPWLQQEHPLLDTQNGGAFCCGPEFSWELWERGEGGVGGVGCSWEHSRFGHWFRLGRNELDQIVLIFDTLPRHFDSKTPSELSRPLFLLQPPTLSFSSEDAALSIKQPLQADETDLLDNLVSRLPPLPSPPPSMSISLLPQEVLEPILHLAIQPSTTEGASILLVCTLWHNLGREKLYEHVTLSSQAAYDSYFLLGGSKASWRPLAQAQRTLDYQNLRSLHLRFGPLTKLPFSLSSSNPSPPYLPRFRNLKLIHLDLAKGSSYLSCRPKPMARRVAKLMGGFSPETMILARSSSAEISLSAVMPHHLRRTKYLLLASHHVSHLPSLTPQSCPAMIKNVGFQLLTGVLLPPSALAPLSETTKPSSTRSSPSSSAVGGLQPAVLRDSTFAVCHFTASSVCRMAAVVVSSRQDTSLASRFRSARPNVLFSPPRGLKPSSSQLHPHFPPACSPPVSHLSPRPSPPGYVSPPPSFKDTGSKSCRESRKAKPISSHLSHPFPFLFPSSLPQAFSSTPRSAPSAAIVGNSMLAASGAEGRADLSRWIETQPGSLPTTSTTTSTRRTLPSRSSLPPPRRMYLFPSISGRKGRGADLWSELTSSSSFFFLLLPITLRSSTSSNHPSSTPHSPLHPWILHHRIYHQRHHHQLHSLPLFKTPSTNAFSNLGSLTTSLLAHADEVGVSFDSYMVPDNEIADGQPRLLDVDARVVLPIETHTRFILSSTDVIHDWAVPSLGIKMDAMPGRLNQTSTLIERKGLFFGQCSELCGVYHGFMPIVVEAVELPEYLAWLLAQE

zZ7320208565470240394zZ MYFQDSATPNQEEDGQLRLLDTDTSIVAPVDTHIRFIVSAADVIHDFAIPSLGIKIDACPGRLNQVSALIEREGVFYGQCSELCGVAHSAMPIKLEVVSLPEFLE

The only weird thing that I am able to discern is that the sequence is significantly larger than the others, also with less identity. What could be causing this error?

salvatierra8 avatar Aug 02 '23 20:08 salvatierra8

Hello,

Seems like a false positive hit, which can be resolved by lowering the search sensitivity after I implement the feature as #19.

For now, could you please try to use different tree inference methods (FastTree or RAxML) and see if the issue persists? This will specify which step is failing, between alignment and tree inference.

endixk avatar Aug 07 '23 01:08 endixk

Hello,

I forgot to update the topic, I did used Fasttree and it worked. But I will also try the other solution and hopefully to not forget to make a comment about it. Thank you very much!

salvatierra8 avatar Aug 07 '23 15:08 salvatierra8

So, I just tested the new feature, but so far the default tree option is not working for any of the sensitivity options at least for my data. I have rerun the tree using Raxml without any problem with both default sensitivity and lowest sensitivity option.

salvatierra8 avatar Aug 14 '23 22:08 salvatierra8

Could you check if the same super long COX1 sequence was found from the profile generated with the lowest sensitivity option? If so, I will try to look into the reference gene database of these mitochondrial genes.

endixk avatar Aug 16 '23 00:08 endixk

yes it did happen but not with COX anymore but TUB1

salvatierra8 avatar Aug 21 '23 16:08 salvatierra8

Hi, I'm getting the same error for another protein:

ERROR! Tree file of gene HEM12 not found : tree/aligned_HEM12_pro.zZ.fasta.treefile

I had a look in the fasta file and there is no false positive hit as in @salvatierra8's case. My sequences all line up nicely. Ran again with raxml and fasttree and it finished without problems.

JWDebler avatar May 30 '24 03:05 JWDebler

I recently stumbled into this error using a smaller dataset and found the exact reason why IQ-TREE suffers.

IQ-TREE deduplicates the input MSA, therefore if given MSA contains 3 or less unique alignment rows, the tree won't be produced, which subsequently results in this gene tree not found error.

My recent commit rectifies this issue, and will be included in the next stable release. I suppose a binary compiled with the most recent version won't suffer from this issue anymore.

I would be most appreciated If anyone can test this on your dataset to see whether the issue is fixed.

endixk avatar Jun 03 '24 05:06 endixk

I just ran tree with your recent commit version and got this error:

image

The command used:

ufcg tree -i output_lentis -l label -a nucleotide -t 16 -o output_lentis_tree_nucleotide

Not sure where the -T comes from which it is complaining about. Still happens if I remove the -t 16.

JWDebler avatar Jul 03 '24 06:07 JWDebler

-T option is given internally to set a multi-thread option for iqtree binary. This error should not happen, unless the dependent binary is either not properly installed or updated with this argument removed (which is not likely).

Please check your iqtree installation and try again, and if the error persists, please provide the resulting messages with -dev option given.

endixk avatar Jul 05 '24 04:07 endixk

Looks like it was due to an old version of iqtree installed via apt.

JWDebler avatar Jul 08 '24 00:07 JWDebler

OK, next problem :-) The treebuilding step finished correctly, however the final 'cleanup' didn't happen. All the files in the output directory have 'zZ' in their filenames, and the 'label' tag from the metadatafile used during profile has not been applied. All the files instead have strings zZ2641650705628771812zZ etc. Previous successful runs clean up the directory and moved files into subfolders. Can I run the respective commands manually somehow? I just had a look at the prune model, but the run did not produce a .trm file, maybe that is the problem? Cheers

JWDebler avatar Jul 08 '24 07:07 JWDebler

This seems to be a problem with the current git version. The version installed via conda (without the iqtree fix) properly processes all the files after the tree building step.

Git version: image

Conda version: image

JWDebler avatar Jul 09 '24 06:07 JWDebler

@JWDebler I looked into this, and found out that the Maven compiled binary doesn't properly include the GSI calculation package as a dependency. Precompiled JAR (including the conda release) doesn't suffer from this. Confusing part is that the process is finishing without invoking any error.

Since I do not have a source code for this package, I need to find a way to properly include the package into the pom.xml configuration. Until I found out the solution, please use the -G option included from the recent commit, which will turn off the GSI analysis and evade the problem. If you need a GSI annotated tree output, please use the stable conda version.

endixk avatar Jul 09 '24 23:07 endixk

@endixk Thanks, yep renaming works with this commit. The folder doesn't get cleaned up though, all the files are in the same folder while the previous conda version (1.0.5) organises everything neatly like this: image Just ran the current conda version (1.0.6) and it also didn't clean up the resutls folder.

JWDebler avatar Jul 10 '24 09:07 JWDebler

The cleaning script is included in the config payload and they'll be gone after the version update. It should work fine after downloading it with ufcg download -t config.

endixk avatar Jul 10 '24 15:07 endixk