instaparse insta/re-parse

Hi Mark,

I needed this function for a project I'm working on, and I thought it was broadly useful enough to propose for the API. I suspect this function will get repeatedly rewritten if not provided.

Two use cases, and I'm using both already. The first is macro expansion: there might be macros in the macros, so we have to do insta/transform as many times as necessary until there aren't any macros left. Then, we need to completely flatten everything and reparse with the same rules.

The second is embedded code: Imagine a simple config language, where one option may contain some Scheme. It may not even be in there, and we might have a lot of config to do quickly, say on a boot. The config language is designed to be parsed with regular expressions, which is common. The parser will run faster if we just scoop the Scheme and parse it in a separate pass, with a different grammar.

An elegant extension of the grammar is possible with this function, namely progressive refinement. Currently if you provide more than one rule with the same name, Instaparse uses the last one it encounters. Actually you may want to retain this behavior, since it lets us hot-load a different version of a rule. The progressive refinement option is to parse the tree using the first set of definitions of all rules. Then, if there are remaining rules, it transforms the collected nodes against the next set of definitions, and so on if there are more than two. I think this doesn't require a separate pass, conceptually, but insta/re-parse makes it cheap to build this way. There are tradeoffs.

Aug 16 '13 15:08 mnemnion

As an aside, one of the grammars I provided for demo produces an enlive tree as ({...}). The other produces it as {...}. I couldn't figure out why this was; one is built from a string, the other from a file. The code reflects the need to sometimes strip the containing list. My suspicion is that providing the tree as a list, rather than a map, is correct Enlive, since a master node then has the same syntax as any other node.

Aug 16 '13 15:08 mnemnion

I believe {} to be the correct output for Enlive format. If you are getting a sequence, most likely you have hidden the name of the root tag, so there's no tag to give to the top-level node. Whether the grammar comes from a string or file shouldn't have any bearing on this behavior. If it is not the case that you have hidden the top-level tag and you're getting back a sequence, please send me an example, as I consider that to be a bug. Thanks!

I'll take a look at your pull request soon-ish.

Aug 16 '13 17:08 Engelberg

The top level was a hidden tag, so you nailed that. I don't know that it changes the code I wrote, which needs to strip the list around the val of a :content tag anyway.

cheers, -Sam.

On Fri, Aug 16, 2013 at 10:36 AM, Mark Engelberg [email protected]:

I believe {} to be the correct output for Enlive format. If you are getting a sequence, most likely you have hidden the name of the root tag, so there's no tag to give to the top-level node. Whether the grammar comes from a string or file shouldn't have any bearing on this behavior. If it is not the case that you have hidden the top-level tag and you're getting back a sequence, please send me an example, as I consider that to be a bug. Thanks!

I'll take a look at your pull request soon-ish.

— Reply to this email directly or view it on GitHubhttps://github.com/Engelberg/instaparse/pull/45#issuecomment-22781419 .

Aug 16 '13 19:08 mnemnion

Can you give me a concrete example of your first use case?

Aug 18 '13 06:08 Engelberg

Sure!

You've got some code under a node, and one of the tags is :macro. you use insta/transform to expand :macro, and now you have to re-parse the :macro node, because there might be a :macro in it. If so, you transform again, until there's no :macro left. Now, you have to take nodes that had macros in them and re-parse the whole thing, because you may well have a very different structure now.

Does that make sense?

On Sat, Aug 17, 2013 at 11:06 PM, Mark Engelberg [email protected]:

Can you give me a concrete example of your first use case?

— Reply to this email directly or view it on GitHubhttps://github.com/Engelberg/instaparse/pull/45#issuecomment-22825372 .

Aug 18 '13 19:08 mnemnion

Clojure macros essentially transform an abstract syntax tree into another abstract syntax tree. I see that macro transforms might need to be applied repeatedly until no macros remain, but I don't see why any re-parsing would be required. Is this some other kind of macro than Clojure's, like a text replacement macro?

Aug 18 '13 19:08 Engelberg

Definitely not a clojure-type macro. More like a preprocessor or Knuth style literate expansion macro.

On Sun, Aug 18, 2013 at 12:45 PM, Mark Engelberg [email protected]:

Clojure macros essentially transform an abstract syntax tree into another abstract syntax tree. I see that macro transforms might need to be applied repeatedly until no macros remain, but I don't see why any re-parsing would be required. Is this some other kind of macro than Clojure's, like a text replacement macro?

— Reply to this email directly or view it on GitHubhttps://github.com/Engelberg/instaparse/pull/45#issuecomment-22837623 .

Aug 18 '13 20:08 mnemnion

OK, I see now what you're driving at. There are some things I need to think about, for example, whether the error messages are sufficiently meaningful in a reparse. These are things that I'll also need to address when dealing with parsing pre-tokenized strings, which is another open feature request. So I'm going to leave this open for now.

Note to others who find this thread: I would welcome further comments along the lines of "Yes this sounds useful", or "I think the API for something like this should be more specific / more granular" or "I think the API should be more general and do ___."

Aug 20 '13 00:08 Engelberg