kaitai_struct_compiler
kaitai_struct_compiler copied to clipboard
Create new AST node Group in the expression language, that represents value in parenthesis
This change fixes https://github.com/kaitai-io/kaitai_struct_tests/blob/master/formats/expr_ops_parens.ksy test for Java and, I'm sure, for all other targets
Unfortunately, I have to disagree with this one, please see https://github.com/kaitai-io/kaitai_struct/issues/69.
Adding "force parentheses" operators seems to be a misunderstanding how AST works. These parentheses add no information to the AST object, it's just a weird trick to suppress and work around the problem instead of solving it.
I don't think so. First of all, group node can be used to attach positional information if that even will be implemented (and I think it should if we want to produce nice error messages). Positional information is different for
(1+2)
and
( 2+3 )
So you can't ignore that node.
These parentheses add no information to the AST object, it's just a weird trick to suppress and work around the problem instead of solving it.
Just the opposite -- the group node accurately laided out to the expression language structure. The https://github.com/kaitai-io/kaitai_struct/issues/69 just highlight problem which is gracefully and simply solved by this node.
Adding "force parentheses"
On the contrary, without a group node, you are forced to insert brackets even where they are not needed, simply because you cannot distinguish between places where they are needed and those where they are not needed. Example:
// X * Y
BinOp(X, Mul, Y)
Should you wrap X and Y with parenthesis when generate text? This depends heavily on the X
and Y
expressions. If it something like IntNum
, then parenthesis is not required, but if that is BinOp(..., Add, ...)
they are definitely needed. So, BinOp
should not wrap their children into parenthesis.
Then, maybe, need just to wrap BinOp
itself? But then you'll get unnecessary parenthesis for
// ((a ? b) ? ...)
BinOp(BinOp(...), ...)
which you are trying to avoid. So, you can't add parenthesises neither inside BinOp
, nor outside without doing some tricks for determine actual expression type of X and Y. You can't just delegate that work to the children, because they'll face with the same problem.
So you need to decide types right and here. In each place where you process such expression. Not scalable.
Group node is the baked result of such computations. You can infer it automatically from more human-like AST representation, like
// (1 + 2) * (3 + 4)
BinOp(
BinOp(IntNum(1), Add, IntNum(2))
Mul,
BinOp(IntNum(3), Add, IntNum(4))
)
and then always work only with that normalized representation. Another question, is you will so often create AST by hands?... You should create it when you parse text. There is no reason to drop this node.
I'm also not great fan of ladders of parenthesises, so you can see that I try to remove pathological cases by merging Group
nodes in things like
(((1) + (((2 - 3)) * 4)))
to something like
((1) + ((2 - 3) * 4))
Of course it is not ideal (but right now I'm even not see if tests are running, and from my local running I see 100 failed tests on slightly tuned master), but this can be improved in the future. Anyway, this is generated code which is hardly ever even read by humans (if they not looking for bugs)
You are wrongly think that new node is the "parenthesis node". This is not true. It's a Group
node which is represented as expression in parenthesis in many languages including Kaitai expression language. It is naturally present in the AST, and trying to pretend that it doesn't exist only complicates things. Each time as you produce parenthesis in the AbstractTranslator
descendants you are using that node but in complicated and implicit form which is much harder to understand. I bet that this led to the half of errors in the tests
KS expression language has certain operator precedence order, and target languages have their own precedence order.
This is only representation. AST in both cases will be the same. Some nodes may be redundant, but this should be handled in the translator (if you should worry about it at all. If you want a beautiful readable source after generation, then it is easier to use the means of automatic formatting of the target language)
In theory, target language might not have any concept of parenthesis at all — e.g. assembler-style languages don't have them, instead spelling out individual operations one by one in correct order.
Again, this is specific representation that very weakly related to AST structure. Remember, AST is Abstract Syntax Tree. By default Group
node renders as expression in parenthesizes but you are not forced to use that representation. Of course, I had no chance to check that generated sources in all languages is correct because I even couldn't see test output, but direction is right. Now I know how to see test output and can check this.
Another example is C: you would routinely will have expressions broken down into multiple lines, e.g. when dealing with string operations with
strcat
/strcpy
.
This is orthogonal example: you should split out expressions into statements if in the target language some Kaitai expressions represented as statements. This is not related to the Group
node
Anyway, you feel free to close this PR is you think that Group
node shouldn't exist
Ping. If none of you have changed your mind, then it is better to close this PR.