antlr4
antlr4 copied to clipboard
OSS-Fuzz findings for antlr4
Hello dear antlr4 maintainers,
there are two issues found by OSS-Fuzz which are varified as fixed
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=49339 https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=50093
We did not see any commits related to these two issues. Can you please look at these findings and determine if they are valid?
Thank you!
Best regards, Henry
Please could you elaborate on the details and write them down here? I have persmisson denided
for the second issue and unclear error in the first one.
Hello KvanTTT,
thanks for your reply! If you want I can add your email into the project.yaml of antlr4 on OSS-Fuzz, https://github.com/google/oss-fuzz/blob/ed8ad012a2243a357f67d7f6953f2621a34fad3f/projects/antlr4-java/project.yaml
Then you will receive all the emails regarding OSS-Fuzz findings of antlr4.
If you don't want to receive that much email I can send you the stack trace and crashing input to your private e-mail, I am not sure if it's good to post the stacktrace and crashing input here.
Thanks!
I am not sure if it's good to post the stacktrace and crashing input here.
If it's not confidential, I don't see a reason why it should not be here.
Could you please share the grammar as well?
It's running the tool, trying to display an error, then crashes here in Antlr3. In Antlr4, it's this, not sure why it was changed. The doc on replaceAll() is here.
at java.base/java.util.regex.Pattern.error(Pattern.java:2028)
at java.base/java.util.regex.Pattern.<init>(Pattern.java:1432)
at java.base/java.util.regex.Pattern.compile(Pattern.java:1069)
at java.base/java.lang.String.replaceAll(String.java:2142)
at java.base/java.lang.invoke.MethodHandle.invokeWithArguments(MethodHandle.java:729)
at com.code_intelligence.jazzer.sanitizers.RegexInjection.hookInternal(RegexInjection.kt:139)
at com.code_intelligence.jazzer.sanitizers.RegexInjection.stringHook(RegexInjection.kt:108)
at org.antlr.runtime.BaseRecognizer.getTokenErrorDisplay(BaseRecognizer.java:336)
at org.antlr.v4.parse.ToolANTLRParser.getParserErrorMessage(ToolANTLRParser.java:44)
at org.antlr.v4.parse.ToolANTLRParser.displayRecognitionError(ToolANTLRParser.java:31)
at org.antlr.runtime.BaseRecognizer.reportError(BaseRecognizer.java:186)
at org.antlr.v4.parse.ANTLRParser.lexerElements(ANTLRParser.java:4140)
... 1012 more
By the way, this method should not be used since it's deprecated (ANTLR 4).
But, it looks like this crash is executing the Antlr tool which is based on Antlr3. So, even if the Antlr4 code is newer, the parser for the Antlr4 tool is Antlr3. We really should update to Antlr4. We already are halfway there: we have an Antlr4 tool that works (4.10.1).
We already are halfway there: we have an Antlr4 tool that works (4.10.1).
Do you mean removing ANTLR 3 dependency? I am supportive of the idea but not sure @parrt will accept such a big update. It's a lot of work (all .g
files should be rewritten: https://github.com/antlr/antlr4/tree/master/tool/src/org/antlr/v4/parse).
Yes, remove the Antlr3 dependency. Here we see an issue with the error reporting in the Antlr3 code. Isn't the main issue in keeping with A3 is because of the AST construction?
Not only AST construction. ANTLR 3 grammars are used for AST traveling. But in my opinion, most part of walkers can be simplified significantly. I've completed this for LeftRecursiveRuleWalker.g (unfortunately I can't find my branch for demonstrate now).
Hi all,
Could you please share the grammar as well?
Unfortunately, due to the nature of this fuzz test, the grammar is embedded in the crashing input. We first fuzz (randomly generate) the grammar for a given run, and then we fuzz the input to that grammar. While this certainly produces strange / unrealistic grammars, we thought it would be the best way to discover bugs in antlr4. At the moment I do not know a way to cleanly separate the input for the grammar vs that of the input to that grammar short of visual examination (the grammar is at the beginning).
By the way, this method should not be used since it's deprecated (ANTLR 4).
Which method is deprecated precisely? Here is the fuzz target we wrote for antlr4. If you have some suggestions on how to improve it, we would certainly welcome the feedback.
The grammar fuzzer calls the "interpreter" parser engine here. This use case applies to the Intellij Antlr plugin and the experimental Antlr Lab web site, but as far as I know, most people don't do this. Most would use the tool jar to generate java (or other target) source code, which would then be compiled and run through an Antlr4 driver program. Does the fuzzer test the "normal" use case?
Since we are looking for bugs / vulnerabilities when we fuzz, we don't necessarily try to emulate a "normal" use case, but the code path being tested should be viable (i.e. it should be possible for external input to reach the specified methods). Given that Antlr4 appears, at least to me, to be designed to be used primarily as a command line tool, it is a bit different than other software targets we have fuzzed.
I am not deeply familiar with Antlr4 and used the documentation to put together the initial fuzz target. The goal was to programmatically emulate the functionality listed here. A concrete example of how to do that would be very helpful.