Augustus icon indicating copy to clipboard operation
Augustus copied to clipboard

PP::ProfileReadError while using `fastBlockSearch` and `augustus --proteinprofile`

Open endixk opened this issue 2 years ago • 3 comments

Hi, I encountered an error when I provide a protein profile to the program.

Running fastBlockSearch <seq> <prfl> gives this message:

terminate called after throwing an instance of 'PP::ProfileReadError'
  what():  std::exception
Aborted

Running augustus --species=<species> --proteinprofile=<prfl> <seq> gives this:

augustus: ERROR
	PP::Profile: Error parsing pattern file"foo.prfl", line 8.

I found this kind of information block from the corresponding line:

[dist]
# distance from previous block
# <min> <max>
0       57

After removing all these [dist] information from the profile, the program ran without an error. Nevertheless, I do want to include these information, which might be non-negligible in some occasions.

I didn't experience this kind of problem from the previous builds of augustus. e.g. Using conda build 3.4.0 pl5262h5a9fe7b_2 runs without an error with same input files.

I will be much appreciated if you can give a quick check and hopefully solve the issue soon.

Thanks!

Daniel

endixk avatar Jun 29 '22 06:06 endixk

I got the same error as the developer from funannotate. It seems to be related to the newer GCC/Ubuntu version. It does not work with Ubuntu 22.04, at least by compiling. I figured that out by using BUSCO.

@KatharinaHoff @MarioStanke May I ask you to take a look? I would assume a required library changed slightly.

OnlineArts avatar Jul 18 '22 13:07 OnlineArts

I just tried with the current version of Augustus the following and it worked:

cd Augustus/docs/tutorial/data
msa2prfl.pl --prefix_from_seqnames --max_entropy=0.75  --blockscorefile=PF00225_seed.blocks.txt PF00225_seed.txt > PF00225_seed.prfl
fastBlockSearch --cutoff=1.1 chr4.103M.fa PF00225_seed.prfl

I need more information to reproduce the problem and then try to fix it: Please make the files and command lines that produced the input also available.

MarioStanke avatar Jul 19 '22 14:07 MarioStanke

I have the same problem with the conda installation of Augustus.

My command is augustus --codingseq=1 --proteinprofile=28538at7147.prfl --predictionStart=18091799 --predictionEnd=18101912 --species=fly NT_033777.3.temp

and the error is

augustus: ERROR
	PP::Profile: Error parsing pattern file"28538at7147.prfl", line 8.

As in the case above (https://github.com/Gaius-Augustus/Augustus/issues/346#issue-1288236984) this was the line following a [dist] block. Once I removed that block, a new error pointed to the next [dist] block. After removing all the [dist] sections in the profile file the command worked. I attach both file versions, 28538at7147_problem.prfl and 28538at7147_ok.prfl.

Additional info (may or may not be helpful): I tried with multiple build versions from conda across v3.4.0 and also v3.3.3 and I got the same error. Curiously, I had previously installed build version augustus-3.4.0-pl5321h877ab46_5 back in March and this installation worked fine. When I re-installed this version in a new environment today it failed.

Also of interest is this issue: https://github.com/nextgenusfs/funannotate/issues/724 It seems that the issue is very similar and was only reported in May.

augustus_problem_files.zip

berkelem avatar Jul 28 '22 11:07 berkelem

The error above is also being reported by BUSCO users:

https://gitlab.com/ezlab/busco/-/issues/584

berkelem avatar Aug 15 '22 08:08 berkelem

@LarsGab It looks like this exception is thrown in Profile::parse_stream. Can you please take this up?

MarioStanke avatar Aug 17 '22 06:08 MarioStanke

Hi,

I tried to reproduce this error with the latest version of Augustus from GitHub and the data provided by @berkelem. I ran Augustus on two different machines with different versions of Ubuntu and gcc, it worked fine in both cases. Have you tried running it with Augustus from GitHub? Otherwise, it might be a problem with the Augustus version uploaded to Bioconda. Best, Lars

LarsGab avatar Aug 17 '22 11:08 LarsGab

I used the Github version, be more precise:

git clone https://github.com/Gaius-Augustus/Augustus.git /opt/mosga/tools/augustus
cd /opt/mosga/tools/augustus/
git checkout b69e6bccfd46b4c7452407aafb2d6a6077e60ab8

The problem has been circumvented for me since BUSCO 5 switched to MetaEuk instead of using Augustus. That's why I, unfortunately, can not provide more information to reproduce the issue, and it appeared in an intermediate development step. Usual Augustus executions run fine.

OnlineArts avatar Aug 17 '22 11:08 OnlineArts

Yes the Github version seems to be fine, but the Bioconda version is causing problems for BUSCO. Most users use either the Conda or Docker distributions of BUSCO and both rely on the Augustus version on Bioconda for the Augustus pipeline. Can you reproduce the error with conda?

berkelem avatar Aug 17 '22 12:08 berkelem

In my case, I had the issue WITH the Github version of BUSCO and Augustus, without any conda environment. Install at a Ubuntu 22.04 system BUSCO 4 and the mentioned Augustus Github version, and download all required libraries from apt and cpan. That should recover the situation.

@berkelem

Most users use either the Conda or Docker distributions of BUSCO and both rely on the Augustus version on Bioconda for the Augustus pipeline. Is there any evidence for that since multiple people have detected the issue?

OnlineArts avatar Aug 17 '22 13:08 OnlineArts

I encountered a similar problem running Augustus with BUSCO evidently caused by a change in the behavior of std::ws in new versions of libstdc++. It seems that std::ws now sets the failbit if the eofbit is already set.

I was using Augustus 3.2.3, but it looks like the code still expects the old behavior on the master branch. I was able to fix the problem with a patch like this:

diff --git a/src/pp_profile.cc b/src/pp_profile.cc
index ce9613f1..f0f60610 100644
--- a/src/pp_profile.cc
+++ b/src/pp_profile.cc
@@ -672,8 +672,10 @@ void Profile::parse_stream(istream & strm) {
             // read in the allowed distance range
             istringstream lstrm(readAndConcatPart(strm, type, lineno));
             DistanceType addDist;
-            if(!(lstrm >> addDist >> ws && lstrm.eof()))
-                throw ProfileParseError(lineno - newlinesFromPos(lstrm.str(), lstrm.tellg()) -1);
+            lstrm >> addDist;
+            if (!(lstrm.eof() || lstrm >> ws)) {
+              throw ProfileParseError(lineno - newlinesFromPos(lstrm.str(), lstrm.tellg()) -1);
+            }
             finalDist += addDist;
             } else // if dist is not specified, assume arbitrary distance
                 finalDist.setInfMax();

I think the logic here should work for either behavior of std::ws, but admittedly I haven't tested carefully.

actapia avatar Aug 18 '22 02:08 actapia

Thanks, Andrew. That may explain why the problem came up recently and I couldn't reproduce it before upgrading my computer. Thanks for the code. Lars, I reproduced the problem on Ubuntu 22.04 on my laptop and on cs3 with the current master branch. Can you please first reproduce and fix it?

MarioStanke avatar Aug 18 '22 06:08 MarioStanke

Thanks a lot, Andrew! You pointed me in the right direction. I was able to reproduce the error on our cluster and indeed the std::ws is the problem, as Andrew explained. Your solution fixes the issue of incorrectly raising the ProfileParseError, but it doesn't catch incorrectly formatted distance intervals. Removing std::ws from the original if clause seems to fix the problem, and the error is still handled as intended. I have created a pull request addressing the problem.

LarsGab avatar Aug 18 '22 10:08 LarsGab

Thanks for addressing this issue! Can you make a new conda build with this fix?

berkelem avatar Aug 22 '22 08:08 berkelem