pigeon icon indicating copy to clipboard operation
pigeon copied to clipboard

[Feature Request] Output rules used during parsing

Open d4l3k opened this issue 5 years ago • 6 comments

Sometimes with really large grammars (such as the wikitext/parsoid peg definitions) it's really hard to debug which rules are actually being used to output the final result. It'd be really nice if pigeon could return a tree of which rules are being used to generate the final result.

d4l3k avatar Jul 28 '18 03:07 d4l3k

I wrote some code to modify the grammar and wrap all "run" functions with context information. Might be able to adapt something like this into a more general form.

https://github.com/d4l3k/wikigopher/blob/master/wikitext/debug.go

d4l3k avatar Jul 28 '18 21:07 d4l3k

@d4l3k I not yet understand, what particular functionality your are looking for. How is the tree you are requesting different (or related) to the output you get, if the Debug option is set for the parser? Is your request only about the actions where Go code is executed? In your debug.go you only modify actionExpr, but you seem to ignore to other code expressions (like AndCodeExpr, NotCodeExpr, StateCodeExpr). Are you looking for a way to overwrite (or provide your own version of) the callFuncTemplates (see https://github.com/mna/pigeon/blob/master/builder/builder.go#L35)?

breml avatar Aug 06 '18 21:08 breml

@breml I don't want to be speaking for the OP, but the way I understand it, this is to get an output of just the matched rules (the final tree of PEG rules used to generate the results), whereas the Debug option is very verbose and prints every rule attempted and bactracked (IIRC).

This does sound interesting and useful to me, especially if we could print out the matched input along with the rules (start-end index, and maybe skipping some middle runes when too big). It would still be quite verbose for big grammars, but I can see how it could help debugging a PEG that matches but doesn't give the expected result. I don't believe it could help with a PEG that doesn't match, since there's no matching rules to print out - the Debug would be better in this case.

Might be interesting to look at what other PEG generators do to assist in debugging though? This could be a "solved problem" more or less and I wouldn't know.

Martin

mna avatar Aug 06 '18 22:08 mna

Yep, I was referring to printing the matches. There's another go PEG library which has a PrintSyntaxTree method which just prints each AST node's name and start/end.

https://github.com/pointlander/peg/blob/master/peg.peg.go#L292-L298

d4l3k avatar Aug 07 '18 17:08 d4l3k

OK, I understand.

What would be needed for this?

  • Introduce a new option for the parser to instruct it to output the matched rules / syntax tree, maybe allow to provide some configuration, e.g. what should be printed (rule name, start-end index, matched input / runes, name of the called Go function, etc.)
  • Is output of this to stdout enough or do we need to provide other options, like accepting a Writer interface to allow the user of the parser redirect the output to stdout/file/buffer/etc.
  • Define an output format for this: should it be primarely human-readable or are there other use cases, where it would be interesting to process this output also with tools (grep/awk) or even JSON).
  • Should the output format be configurable (template?)
  • Is it only about the action expressions of the matched rules or is it also about the other expressions?

From the example @d4l3k provided I understand that the prettyprint function in the end prints a (colored) line per "node" (matched rule) in the grammar (see https://github.com/pointlander/peg/blob/master/peg.peg.go#L234).

In https://github.com/d4l3k/wikigopher/blob/master/wikitext/debug.go only the name of the called Go function is printed.

I feel like we first have to outline, what exactly should be implemented. Then we can work towards a PR. @d4l3k would you be willing to work on such a PR?

breml avatar Aug 08 '18 20:08 breml

I think adding in a similar AST struct output would make sense since you have can't print it as you visit anyways. This supports custom printing as well.

My implemention is much less than ideal, just was the easiest way to get something workable. I'd be interested in putting together a PR.

type AST {
  Rule *rule
  Start, End int
  Match []byte
  Children []*AST
}

func ParseAST(...)

A separate parse AST function could also be useful since some times a user may only want the PEG ast instead of the parser output

On Wed, Aug 8, 2018, 16:32 Lucas Bremgartner [email protected] wrote:

OK, I understand.

What would be needed for this?

  • Introduce a new option for the parser to instruct it to output the matched rules / syntax tree, maybe allow to provide some configuration, e.g. what should be printed (rule name, start-end index, matched input / runes, name of the called Go function, etc.)
  • Is output of this to stdout enough or do we need to provide other options, like accepting a Writer interface to allow the user of the parser redirect the output to stdout/file/buffer/etc.
  • Define an output format for this: should it be primarely human-readable or are there other use cases, where it would be interesting to process this output also with tools (grep/awk) or even JSON).
  • Should the output format be configurable (template?)
  • Is it only about the action expressions of the matched rules or is it also about the other expressions?

From the example @d4l3k https://github.com/d4l3k provided I understand that the prettyprint function in the end prints a (colored) line per "node" (matched rule) in the grammar (see https://github.com/pointlander/peg/blob/master/peg.peg.go#L234).

In https://github.com/d4l3k/wikigopher/blob/master/wikitext/debug.go only the name of the called Go function is printed.

I feel like we first have to outline, what exactly should be implemented. Then we can work towards a PR. @d4l3k https://github.com/d4l3k would you be willing to work on such a PR?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/mna/pigeon/issues/72#issuecomment-411542647, or mute the thread https://github.com/notifications/unsubscribe-auth/AA3fMLWFOFJwknAOxPPEfjWLJQYXW45Vks5uO0rxgaJpZM4Vk5FX .

d4l3k avatar Aug 08 '18 21:08 d4l3k