ohm icon indicating copy to clipboard operation
ohm copied to clipboard

`toAST` incorrectly makes some nodes intermediate nodes

Open rrthomas opened this issue 2 years ago • 2 comments

Using Ohm 17.1.0, and the following grammar:

Demo {
  Sequence = ListOf<Exp, ";">

  PrimaryExp = number

  UnaryExp
    = "~" UnaryExp    -- bitwise_not
    | PrimaryExp

  Exp = UnaryExp

  number
    = digit* "." digit+  -- fract
    | digit+             -- whole
}

I run the following code:

import {toAST} from 'ohm-js/extras'

// eslint-disable-next-line import/extensions
import grammar from './demo.ohm-bundle.js'

const matchResult = grammar.match('2')
console.dir(toAST(matchResult))

const matchResult2 = grammar.match('~2')
console.dir(toAST(matchResult2))

and the output is:

[ '2' ]
[ '2' ]

In other words, there's no difference between the ASTs for ~2 and for 2.

It looks to me as though Ohm is first discarding the node for the literal string "~", and then incorrectly deciding that the rule UnaryExp_bitwise_not has only one node, and hence considering it an intermediate node.

rrthomas avatar Nov 06 '23 21:11 rrthomas

I'm seeing a similar behaviour, where something like this (simplified):

ComparisonExp = 
  "a" "IN" aValues
 |"b" "IN" bValues

Gives you only the value of aValues or bValues when toAST is called, while I expect to see if the node is called on "a" or "b", and with which operator. Using mapping doesn't work at all, because ComparisonExp only maps to the 0-th place, if I try to access the 1st or 2nd I get an exception.

To be fair, why can't .toAST() generate a complete AST, without discarding literals? Apart from being buggy, it's really hard to read.

cristiano-belloni avatar Aug 07 '24 16:08 cristiano-belloni

(Sorry that this reply is coming so late — I'm digging out from an issue backlog.)

Hmmm, yeah this is definitely not what I would expect.

If you want an AST, I would generally recommend writing an operation and constructing it explicitly. I think the documentation on this isn't as clear as it could/should be, but toAST is really only intended as a quick and dirty way of getting a "best guess" at a useful AST. But that involves using a bunch of heuristics that don't make sense for every use case.

Internally, Ohm maintains a concrete syntax tree, which maintains all of the information. That can be accessed via semantic actions that are part of an operation. That is the primary way of handling semantics that I'd recommend to most Ohm users today.

pdubroy avatar Jul 29 '25 11:07 pdubroy