antlr4 OSS-Fuzz findings for antlr4

Hello dear antlr4 maintainers,

there are two issues found by OSS-Fuzz which are varified as fixed

https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=49339 https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=50093

We did not see any commits related to these two issues. Can you please look at these findings and determine if they are valid?

Thank you!

Best regards, Henry

Sep 05 '22 12:09 henryrneh

Please could you elaborate on the details and write them down here? I have persmisson denided for the second issue and unclear error in the first one.

Sep 05 '22 13:09 KvanTTT

Hello KvanTTT,

thanks for your reply! If you want I can add your email into the project.yaml of antlr4 on OSS-Fuzz, https://github.com/google/oss-fuzz/blob/ed8ad012a2243a357f67d7f6953f2621a34fad3f/projects/antlr4-java/project.yaml

Then you will receive all the emails regarding OSS-Fuzz findings of antlr4.

If you don't want to receive that much email I can send you the stack trace and crashing input to your private e-mail, I am not sure if it's good to post the stacktrace and crashing input here.

Thanks!

Sep 05 '22 14:09 henryrneh

I am not sure if it's good to post the stacktrace and crashing input here.

If it's not confidential, I don't see a reason why it should not be here.

Sep 05 '22 14:09 KvanTTT

Here you go,

49339.zip 50093.zip

Please let me know if you have any questions!

Sep 05 '22 15:09 henryrneh

Could you please share the grammar as well?

Sep 05 '22 15:09 KvanTTT

It's running the tool, trying to display an error, then crashes here in Antlr3. In Antlr4, it's this, not sure why it was changed. The doc on replaceAll() is here.

 at java.base/java.util.regex.Pattern.error(Pattern.java:2028)
 at java.base/java.util.regex.Pattern.<init>(Pattern.java:1432)
 at java.base/java.util.regex.Pattern.compile(Pattern.java:1069)
 at java.base/java.lang.String.replaceAll(String.java:2142)
 at java.base/java.lang.invoke.MethodHandle.invokeWithArguments(MethodHandle.java:729)
 at com.code_intelligence.jazzer.sanitizers.RegexInjection.hookInternal(RegexInjection.kt:139)
 at com.code_intelligence.jazzer.sanitizers.RegexInjection.stringHook(RegexInjection.kt:108)
 at org.antlr.runtime.BaseRecognizer.getTokenErrorDisplay(BaseRecognizer.java:336)
 at org.antlr.v4.parse.ToolANTLRParser.getParserErrorMessage(ToolANTLRParser.java:44)
 at org.antlr.v4.parse.ToolANTLRParser.displayRecognitionError(ToolANTLRParser.java:31)
 at org.antlr.runtime.BaseRecognizer.reportError(BaseRecognizer.java:186)
 at org.antlr.v4.parse.ANTLRParser.lexerElements(ANTLRParser.java:4140)
 ... 1012 more

Sep 05 '22 20:09 kaby76

By the way, this method should not be used since it's deprecated (ANTLR 4).

Sep 05 '22 21:09 KvanTTT

But, it looks like this crash is executing the Antlr tool which is based on Antlr3. So, even if the Antlr4 code is newer, the parser for the Antlr4 tool is Antlr3. We really should update to Antlr4. We already are halfway there: we have an Antlr4 tool that works (4.10.1).

Sep 05 '22 21:09 kaby76

We already are halfway there: we have an Antlr4 tool that works (4.10.1).

Do you mean removing ANTLR 3 dependency? I am supportive of the idea but not sure @parrt will accept such a big update. It's a lot of work (all .g files should be rewritten: https://github.com/antlr/antlr4/tree/master/tool/src/org/antlr/v4/parse).

Sep 05 '22 21:09 KvanTTT

Yes, remove the Antlr3 dependency. Here we see an issue with the error reporting in the Antlr3 code. Isn't the main issue in keeping with A3 is because of the AST construction?

Sep 05 '22 21:09 kaby76

Not only AST construction. ANTLR 3 grammars are used for AST traveling. But in my opinion, most part of walkers can be simplified significantly. I've completed this for LeftRecursiveRuleWalker.g (unfortunately I can't find my branch for demonstrate now).

Sep 05 '22 21:09 KvanTTT

Hi all,

Could you please share the grammar as well?

Unfortunately, due to the nature of this fuzz test, the grammar is embedded in the crashing input. We first fuzz (randomly generate) the grammar for a given run, and then we fuzz the input to that grammar. While this certainly produces strange / unrealistic grammars, we thought it would be the best way to discover bugs in antlr4. At the moment I do not know a way to cleanly separate the input for the grammar vs that of the input to that grammar short of visual examination (the grammar is at the beginning).

By the way, this method should not be used since it's deprecated (ANTLR 4).

Which method is deprecated precisely? Here is the fuzz target we wrote for antlr4. If you have some suggestions on how to improve it, we would certainly welcome the feedback.

Sep 06 '22 07:09 vargen

The grammar fuzzer calls the "interpreter" parser engine here. This use case applies to the Intellij Antlr plugin and the experimental Antlr Lab web site, but as far as I know, most people don't do this. Most would use the tool jar to generate java (or other target) source code, which would then be compiled and run through an Antlr4 driver program. Does the fuzzer test the "normal" use case?

Sep 06 '22 11:09 kaby76

Since we are looking for bugs / vulnerabilities when we fuzz, we don't necessarily try to emulate a "normal" use case, but the code path being tested should be viable (i.e. it should be possible for external input to reach the specified methods). Given that Antlr4 appears, at least to me, to be designed to be used primarily as a command line tool, it is a bit different than other software targets we have fuzzed.

I am not deeply familiar with Antlr4 and used the documentation to put together the initial fuzz target. The goal was to programmatically emulate the functionality listed here. A concrete example of how to do that would be very helpful.

Sep 06 '22 12:09 vargen

antlr4 antlr4 copied to clipboard

OSS-Fuzz findings for antlr4

antlr4
antlr4 copied to clipboard