parboiled icon indicating copy to clipboard operation
parboiled copied to clipboard

Is the project still active ?

Open sleroy opened this issue 3 years ago • 20 comments

Hi, I don't know if the project is still active.

I am an user of Parboiled and I would like to give an hand in maintaining or upgrading the framework.

Sylvain

sleroy avatar Feb 02 '21 13:02 sleroy

Hi Sylvain, the project is not really active anymore. A lot has happened since it was started 12 years ago.

What exactly would you need it for?

sirthias avatar Feb 02 '21 13:02 sirthias

I am using any time I need to write some small parsers and even a big one recently ( ActionScript 3). I did not find anything close to Parboiled apart SonarQube/Sonarsource parsing framework.

I am quite fluent in parsing and I like the test friendly approach of your framework allowing an incremental writing process in contrast of Antlr, Javacc etc.

sleroy avatar Feb 02 '21 13:02 sleroy

fyi: after having used (and learned from the excellent) parboiled library up till about 4 years ago, i switched to the (also) excellent (scala) fastparse. https://www.lihaoyi.com/fastparse/

nmcb avatar Feb 02 '21 14:02 nmcb

Yes, fastparse is great and can be a nice solution if you are writing parsers in Scala.

@sleroy Are you using parboiled's Java or Scala side?

sirthias avatar Feb 02 '21 14:02 sirthias

I am using the Java side. Most projects I am dealing with are written in Java or compatible Java.

sleroy avatar Feb 02 '21 14:02 sleroy

One of the modification, I would like to bring is either upgrading ASM or embedding it to avoid conflicts with spring/hibernate etc

sleroy avatar Feb 02 '21 14:02 sleroy

Unfortunately I haven't been writing Java for many years now and, as such, am totally out of touch with the latest developments with regard to the language and the library eco-system. But I'd be more than happy to support a fork and further development on your side, similarly to what I've already done with pegdown.

So, if you'd like to take over: Just fork, hack and cut a new release and I'll happily put in a pointer and promotion, if you'd like.

sirthias avatar Feb 02 '21 14:02 sirthias

Thank you ,that's really nice of you. By the way, it's sad that the blog articles have been deleted or removed from the blog ( in the Wiki)

sleroy avatar Feb 02 '21 14:02 sleroy

Hi, Can you recommend any other Java project that is at least a bit similar to parboiled? Bonus if there is a not very complicated migration path from parboiled rules (which I have used extensively to parse somewhat structured outputs of various commands and configuration dumps). I am having trouble googling anything even remotely similar... Regards, Garagoth.

Garagoth avatar Mar 13 '21 18:03 Garagoth

@Garagoth I am still using Parboiled for this purpose. Scala alternatives have been advised by @sirthias. You may write them using ANTLR or JavaCC or SonarSource parsing framework if you have some time to invest.

sleroy avatar Mar 13 '21 20:03 sleroy

Can you recommend any other Java project that is at least a bit similar to parboiled?

I have developed a packrat PEG parser (called the pika parser) that works bottom-up, and has some interesting properties (see the linked paper for details):

https://github.com/lukehutch/pikaparser

I am currently working on a new packrat PEG parser (the squirrel parser) that works top-down. However, I haven't written documentation yet for it (I'm writing the paper for this parsing algorithm now):

https://github.com/lukehutch/squirrelparser

Both of the above parsers fully support both direct and indirect left recursion. The benefit of the pika parser is that it supports optimal error recovery, because it works bottom-up, so it can find all grammatically-correct structure fragments, no matter what sort of syntax error is present. The benefit of the squirrel parser is that, at least according to my benchmarks so far, it is the fastest PEG parser for the JVM ecosystem.

lukehutch avatar Jun 27 '21 00:06 lukehutch

(shameless plug) I have written a PEG parser generator for Java 17 that derives grammar rules from datatypes of parse trees. See Rekex .

The basic idea is that alternation/concatenation grammar rules correspond to sum/product datatypes, or, sealed/record types in Java; therefore it is possible to have datatypes of the parse tree to reflect the grammar precisely; it is not necessary to define the grammar rules as a separate step. And constructors of datatypes are used to construct tree nodes; with record type it is very succinct.

Note that in parboiled (and others like jparsec), rules are Java objects, which raises a question - how does a rule reference itself recursively. Some magics are needed to build a cyclic object graph.

In Rekex, rules are Java types, and types can reference themselves recursively. It is as natural to define a recursive type as to define a recursive grammar. The two share the same model with which we conceptualize a context-free language.

Status of the project - I'm pretty sure it's production ready; I've done a lot of testing myself. But the main concern right now is whether this new approach to parsing is adequate for real world application, whether it is acceptable by the public. I would really appreciate any feeback, thanks!

zhong-j-yu avatar Aug 19 '21 21:08 zhong-j-yu

(shameless plug) I have written a PEG parser generator for Java 17 that derives grammar rules from datatypes of parse trees. See Rekex .

Wow, this is very cool, and I am designing a programming language right now that will do exactly this -- all algebraic data types will be able to be serialized and deserialized according to a bijective mapping between their field values and some syntax.

I like how you used parameter annotations on record parameters to achieve this with Java! That's a genius idea.

Note that in parboiled (and others like jparsec), rules are Java objects, which raises a question - how does a rule reference itself recursively. Some magics are needed to build a cyclic object graph.

This is also known as the left recursion problem. Specifically, a top-down parsing function cannot recurse directly or indirectly into itself, if the parser does not make forward progress by consuming at least one character between nested recursive calls to the same function.

Status of the project - I'm pretty sure it's production ready; I've done a lot of testing myself. But the main concern right now is whether this new approach to parsing is adequate for real world application, whether it is acceptable by the public. I would really appreciate any feeback, thanks!

Structurally it's a brilliant way to define a grammar, in my opinion, because it uses existing language features. But how do you solve left recursion in Rekex?

There are several complex workarounds for the left recursion problem. Both the parsers I link above have (different) very clean solutions to this problem. We can discuss more over there if you are interested: https://github.com/lukehutch/squirrelparser/discussions

lukehutch avatar Aug 19 '21 23:08 lukehutch

Thank you, Luke. Serializing an AST to text is an interesting topic, I'll keep up with your project.

Left recursion would cause my parser to stackoverflow at runtime, as you would've expected:) Note that the official PEG paper frowns upon left recursion - A well-formed grammar is a grammar that contains no directly or mutually left-recursive rules. Apparently there are some techniques that can automatically deal with left-recursive PEG, but I have not looked into it closely. In any case, the grammar itself, derived from datatype definitions, can allow left recursions; it is up to the parser implementation how to handle it, and I may provide other parser implementations in future besides the current recursive decent implementation.

What's more important is the realization that there is a relationship between grammar rules and datatypes, an idea that can be applied to other types of grammars. Rekex picks PEG as a practical matter.

zhong-j-yu avatar Aug 20 '21 01:08 zhong-j-yu

I have an OSS project relying on Parboiled. It is not yet, but approaching unmaintainable with current Java versions.

  1. Will this project get updated?
  2. Should I migrate to another project?

I am rather comfortable with the API I get from Parboiled, and would be disappointed to lose that.

binkley avatar Aug 29 '21 03:08 binkley

I'm using it too, trying to replace a regular expression parser written ages ago for tap4j (used in a Jenkins plugin too). Would be great if pull requests & fixes could be applied @sirthias . Maybe offer co-maintainership to some frequent collaborators ? Thanks for parboiled anyway!

kinow avatar Nov 18 '21 05:11 kinow

@kinow Thank you! See my comment above: https://github.com/sirthias/parboiled/issues/147#issuecomment-771668972

sirthias avatar Nov 18 '21 08:11 sirthias

@kinow Thank you! See my comment above: #147 (comment)

Hi @sirthias

I saw that comment about a fork :slightly_smiling_face:

What I was trying to suggest was to keep using the repository and maven groupId/artifactId, if possible. I've seen other projects being forked successfully, but also a fair share that had a few forks active but that stalled or didn't form enough community. And it would be a shame if the same happened to parboiled. I'm truly enjoying using its API, the documentation is really great, it appears to have a good user base and community, and the code appears to be good too from the little I could see.

Thanks! Bruno

kinow avatar Nov 18 '21 09:11 kinow

@kinow

I did a fork that I am using in some projects actually on https://github.com/byoskill/parboiled. However, I am not publishing on the parboiled repository.

Since I am more a Java user than a Scala user, my focus is on the first one.

I am basically maintaining working for the latest versions of Java and adding some small features ( @FunctionalInterface) to ease the writing in Java.

sleroy avatar Nov 18 '21 15:11 sleroy

Thanks for sharing parboiled - the Java implementation is brilliant and I would not want to see it go! Unfortunately, as others noted before, its artifacts cannot be loaded as a dependency in a modular (Maven) project, because of overlapping packages. I therefore also did a fork plus some minimal refactoring to make it modular, mainly for my own purpose, but anyone interested can find it in https://github.com/imagingbook/parboiled-modular .

One additional minor modification is the use of binary search for the contains() method of character sets (Characters). Seems natural, not sure it makes much difference in practice.

imagingbook avatar Feb 07 '23 15:02 imagingbook