java-probabilistic-earley-parser icon indicating copy to clipboard operation
java-probabilistic-earley-parser copied to clipboard

Writing cf-gammars without probabilities

Open roddar92 opened this issue 6 years ago • 2 comments

Dear colleagues,

I'm exploring a good Earley parsers for writing cf-grammars, and this one seems to be friendly for me. Could you tell please, does this parser allow to write cf-grammars without probabilities setting?

P.S. I need Java parser like Lark (Python) for directly rule writing.

Thanks, Daria

roddar92 avatar Oct 24 '19 10:10 roddar92

Dashka,

It is possible if you set all rule probabilities equal to each other, or just ignore the computed probabilities. The parse trees are the same.

If you just need to parse a little and performance is not critical, this library should work just fine. However, this software does not implement all optimizations that are possible for non-probabilistic CFGs, because the probabilities pose some limitations on that. So I cannot make certain efficiency guarantees that a project like Marpa does. As far as I've seen, Marpa is the best non-probabilistic CFG parser out there in terms of efficiency but not possible to use with Java (sadly).

Maarten

digitalheir avatar Oct 24 '19 11:10 digitalheir

Just a heads up in case you run into a strange error about a singular matrix.

You (maybe) need to make sure that the sum of probabilities for any given goal does not exceed 1.0, or you might end up with an error. Strictly speaking, this is an error on the side of the grammar authour, but my software could be more friendly about it. I intended the software to spot this error and normalize the probabilities so they add up to 1, but I forgot if I ever released it.

This is the issue that described the situation: https://github.com/digitalheir/java-probabilistic-earley-parser/issues/12

digitalheir avatar Oct 24 '19 11:10 digitalheir