epa-ng icon indicating copy to clipboard operation
epa-ng copied to clipboard

model info error

Open RachelDanie opened this issue 2 years ago • 2 comments

Hello, I am having trouble running epa-ng with the following error: INFO Selected: Output dir: ./epa_tree/ INFO Selected: Query file: query.fasta INFO Selected: Tree file: T3.raxml.bestTree INFO Selected: Reference MSA: reference.fasta INFO Selected: Automatic switching of use of per rate scalers INFO Selected: Preserving the root of the input tree INFO Selected: Specified model file: RAxML_info.info what(): Model string in provided file seems wrong. XXXX.sh: line 20: 26465 Aborted (core dumped) epa-ng --tree T3.raxml.bestTree --ref-msa reference.fasta --query query.fasta --outdir $OUT --model RAxML_info.info

I am attempting to align 806 amplicon sequences to 1121 nifH reference sequences. I started by running raxml-ng to build a reference tree on muscle-aligned ref seqs with the following command: raxml-ng --msa T2.raxml.rba --model GTR+G --prefix T3 --threads 8 --seed 8273

I then used papara to align query seqs, and the raxml-ng --split to seperate aligned seqs

In my first go running epa-ng, I provided the example model parameters suggested in the full stack tutorial to define the model: GTR{0.7/1.8/1.2/0.6/3.0/1.0}+FU{0.25/0.23/0.30/0.22}+G4{0.47}

But I got the following error: ERR When using epa-ng like this, a model has to be explicitly specified! You may specify it generically (GTR+G), however parameters will not be optimized. Instead we reccommend to use RAxML to re-evaluate the parameters and then pass the resulting RAxML_info file to the epa-ng --model argument. epa-ng will then auto-parse the parameters. ( raxmlHPC -f e -s -t -n info -m GTRGAMMAX )

So I ran the example command above (but I did get an error leading me to change the -m option to GTRGAMMA [the only other possible input it GTRGAMMI), and that executed fine. But using the RAxML_info file produced as input for epa-ng above threw the above error.

Is there some other way to get around this? If it helps below in the contents of the RAxML_info file: _This is RAxML version 7.3.0 released by Alexandros Stamatakis in June 2011.

With greatly appreciated code contributions by: Andre Aberer (HITS) Simon Berger (HITS) Nick Pattengale (Sandia) Wayne Pfeiffer (SDSC) Akifumi S. Tanabe (Univ. Tsukuba)

Alignment has 4167 distinct alignment patterns

Proportion of gaps and completely undetermined characters in this alignment: 93.41%

RAxML Model Optimization up to an accuracy of 0.100000 log likelihood units

Using 1 distinct models/data partitions with joint branch length optimization

All free model parameters will be estimated by RAxML GAMMA model of rate heteorgeneity, ML estimate of alpha-parameter

GAMMA Model parameters will be estimated up to an accuracy of 0.1000000000 Log Likelihood units

Partition: 0 Alignment Patterns: 4167 Name: No Name Provided DataType: DNA Substitution Matrix: GTR RAxML was called as follows:

raxmlHPC -f e -s ref.clus.phyi -t T3.raxml.bestTree -n info -m GTRGAMMA

Testing which likelihood implementation to use Standard Implementation full tree traversal time: 2.301094 Subtree Equality Vectors for gap columns full tree traversal time: 0.809563 ... using SEV-based implementation

Model parameters (binary file format) written to: /home/rodrigues-lab/msa_red/epa_ng/RAxML_binaryModelParameters.info

Overall Time for Tree Evaluation 419.071737 Final GAMMA likelihood: -186925.416854 Number of free parameters for AIC-TEST(BR-LEN): 2248 Number of free parameters for AIC-TEST(NO-BR-LEN): 9

Model Parameters of Partition 0, Name: No Name Provided, Type of Data: DNA alpha: 1.029898 Tree-Length: 201.377284 rate A <-> C: 1.154527 rate A <-> G: 2.645042 rate A <-> T: 1.360458 rate C <-> G: 1.626075 rate C <-> T: 3.503977 rate G <-> T: 1.000000

freq pi(A): 0.240682 freq pi(C): 0.260669 freq pi(G): 0.267798 freq pi(T): 0.230851_

RachelDanie avatar May 27 '22 23:05 RachelDanie

I should also add that simply running epa-ng --tree T3.raxml.bestTree --ref-msa reference.fasta --query query.fasta --outdir $OUT --model GTR+G

did not throw errors, but did not place query sequences in the tree (the resulting .jplace file was only reference sequences)

RachelDanie avatar May 27 '22 23:05 RachelDanie

Hi @RachelDanie !

It's pretty surprising that the first way you tried it (with raxml-ng) didn't work... did you supply that model string on the command line?

Can you try to instead do --model <raxml-ng best.model file>? In principle thats just a file with that string, followed by a partition name and range. It should be one of the outputs of raxml-ng. If that doesn't work, then the issue is probably a bigger one...

Let me know how it works Pierre

pierrebarbera avatar May 30 '22 09:05 pierrebarbera