grammars-v4 icon indicating copy to clipboard operation
grammars-v4 copied to clipboard

VBA grammar - line length

Open ckrueger1979 opened this issue 9 months ago • 7 comments

Hi,

I think this grammar https://github.com/antlr/grammars-v4/blob/master/vba/vba.g4 has a problem with long lines.

This obfuscator https://github.com/oriolOrnaque/VBAObfuscator/ creates too long lines.

The length limit of a line is 1023 https://learn.microsoft.com/en-us/office/vba/language/reference/user-interface-help/line-too-long

I didn't find any reference to the line length in the grammar (see LINE_CONTINUATION and UNDERSCORE)

greetings Carsten

ckrueger1979 avatar Nov 20 '23 13:11 ckrueger1979

Could you clarify the following things:

  1. What do you mean by long lines? Are they lines in generated lexer/parser?
  2. How does obfuscator relate to ANTLR grammar and generated code?

KvanTTT avatar Nov 20 '23 18:11 KvanTTT

  1. Lines that are longer then 1023 chars
  2. Take a long line of unobfuscated code, the obfuscator elongates the line above 1023 chars -> broken VBA

I would expect that the ANTLR parser shouldn't output illegal VBA code

ckrueger1979 avatar Nov 20 '23 18:11 ckrueger1979

I would expect that the ANTLR parser shouldn't output illegal VBA code

Please be precise. Antlr does not "output illegal VBA code." The job of Antlr is to parse input (valid or not), output error messages, and return a parse tree.

The place to add this check would be to override the Emit() method of the base class for the lexer. The method could check the start and stop indices of the token, call Lexer.Emit(), and report the error. We already do something like this in other grammars, e.g., lua. It's an easy fix. However, the change will mean the grammar must be split, and target-specific code added for each target.

kaby76 avatar Nov 20 '23 23:11 kaby76

My compiler construction lecture was 20 years ago, sorry that I've mixed something up.

I thought the parser should be able to parse and emit only valid language and otherwise create an error.

PS: What do you mean with target specific code? Specific to VBA?

ckrueger1979 avatar Nov 21 '23 07:11 ckrueger1979

I thought the parser should be able to parse and emit only valid language and otherwise create an error.

Parsers do not emit code! A parser is a function with the signature boolean parse(string input)--it takes a string and outputs true if the string is valid in the language described by the grammar.

So,

parse("Public Sub Module()
    Dim sd As Boolean
End Sub")

returns true. It does not output VBA code.

What do you mean with target specific code? Specific to VBA?

Antlr generates a parser for the VBA grammar in a programming language that you compile and link into a program. The current targets are CSharp (C#), Cpp (C++), Dart2 (Dart), Go, Java, JavaScript, PHP, Python3, and TypeScript. If you don't tell the parser generator what target you want, it will output a parser in Java.

The generated parser code can reference other code that you write to support the parser. That support code has to be in the target programming language. If you generate the parser for C#, you have to write the support code in C#. This is important because you cannot use grammars that require support code in the Antlr Intellij extension, or lab.antlr.org.

kaby76 avatar Nov 21 '23 10:11 kaby76

Thanks for the detailed explanation!

The parser for VBA will accept code with too long lines, correct? Return true even if the line is longer then 1023 chars

ckrueger1979 avatar Nov 21 '23 10:11 ckrueger1979

The parser for VBA will accept code with too long lines, correct? Return true even if the line is longer then 1023 chars

Yes, you are right. The parser for the VBA grammar accepts lines over 1023. I'll write a fix today or tomorrow.

kaby76 avatar Nov 21 '23 11:11 kaby76