kaitai_struct_compiler icon indicating copy to clipboard operation
kaitai_struct_compiler copied to clipboard

Create new AST node Group in the expression language, that represents value in parenthesis

Open Mingun opened this issue 3 years ago • 6 comments

This change fixes https://github.com/kaitai-io/kaitai_struct_tests/blob/master/formats/expr_ops_parens.ksy test for Java and, I'm sure, for all other targets

Mingun avatar Nov 19 '20 16:11 Mingun

Unfortunately, I have to disagree with this one, please see https://github.com/kaitai-io/kaitai_struct/issues/69.

Adding "force parentheses" operators seems to be a misunderstanding how AST works. These parentheses add no information to the AST object, it's just a weird trick to suppress and work around the problem instead of solving it.

generalmimon avatar Nov 19 '20 23:11 generalmimon

I don't think so. First of all, group node can be used to attach positional information if that even will be implemented (and I think it should if we want to produce nice error messages). Positional information is different for

(1+2)

and

(      2+3     )

So you can't ignore that node.

Mingun avatar Nov 20 '20 04:11 Mingun

These parentheses add no information to the AST object, it's just a weird trick to suppress and work around the problem instead of solving it.

Just the opposite -- the group node accurately laided out to the expression language structure. The https://github.com/kaitai-io/kaitai_struct/issues/69 just highlight problem which is gracefully and simply solved by this node.

Adding "force parentheses"

On the contrary, without a group node, you are forced to insert brackets even where they are not needed, simply because you cannot distinguish between places where they are needed and those where they are not needed. Example:

// X * Y
BinOp(X, Mul, Y)

Should you wrap X and Y with parenthesis when generate text? This depends heavily on the X and Y expressions. If it something like IntNum, then parenthesis is not required, but if that is BinOp(..., Add, ...) they are definitely needed. So, BinOp should not wrap their children into parenthesis.

Then, maybe, need just to wrap BinOp itself? But then you'll get unnecessary parenthesis for

// ((a ? b) ? ...)
BinOp(BinOp(...), ...)

which you are trying to avoid. So, you can't add parenthesises neither inside BinOp, nor outside without doing some tricks for determine actual expression type of X and Y. You can't just delegate that work to the children, because they'll face with the same problem.

So you need to decide types right and here. In each place where you process such expression. Not scalable.

Group node is the baked result of such computations. You can infer it automatically from more human-like AST representation, like

// (1 + 2) * (3 + 4)
BinOp(
  BinOp(IntNum(1), Add, IntNum(2))
  Mul,
  BinOp(IntNum(3), Add, IntNum(4))
)

and then always work only with that normalized representation. Another question, is you will so often create AST by hands?... You should create it when you parse text. There is no reason to drop this node.

I'm also not great fan of ladders of parenthesises, so you can see that I try to remove pathological cases by merging Group nodes in things like

(((1) + (((2 - 3)) * 4)))

to something like

((1) + ((2 - 3) * 4))

Of course it is not ideal (but right now I'm even not see if tests are running, and from my local running I see 100 failed tests on slightly tuned master), but this can be improved in the future. Anyway, this is generated code which is hardly ever even read by humans (if they not looking for bugs)

Mingun avatar Nov 20 '20 16:11 Mingun

You are wrongly think that new node is the "parenthesis node". This is not true. It's a Group node which is represented as expression in parenthesis in many languages including Kaitai expression language. It is naturally present in the AST, and trying to pretend that it doesn't exist only complicates things. Each time as you produce parenthesis in the AbstractTranslator descendants you are using that node but in complicated and implicit form which is much harder to understand. I bet that this led to the half of errors in the tests

KS expression language has certain operator precedence order, and target languages have their own precedence order.

This is only representation. AST in both cases will be the same. Some nodes may be redundant, but this should be handled in the translator (if you should worry about it at all. If you want a beautiful readable source after generation, then it is easier to use the means of automatic formatting of the target language)

In theory, target language might not have any concept of parenthesis at all — e.g. assembler-style languages don't have them, instead spelling out individual operations one by one in correct order.

Again, this is specific representation that very weakly related to AST structure. Remember, AST is Abstract Syntax Tree. By default Group node renders as expression in parenthesizes but you are not forced to use that representation. Of course, I had no chance to check that generated sources in all languages is correct because I even couldn't see test output, but direction is right. Now I know how to see test output and can check this.

Another example is C: you would routinely will have expressions broken down into multiple lines, e.g. when dealing with string operations with strcat / strcpy.

This is orthogonal example: you should split out expressions into statements if in the target language some Kaitai expressions represented as statements. This is not related to the Group node

Mingun avatar Nov 22 '20 21:11 Mingun

Anyway, you feel free to close this PR is you think that Group node shouldn't exist

Mingun avatar Nov 23 '20 03:11 Mingun

Ping. If none of you have changed your mind, then it is better to close this PR.

Mingun avatar May 13 '21 19:05 Mingun