prettier-java icon indicating copy to clipboard operation
prettier-java copied to clipboard

Upgrade to chevrotain 7

Open clementdessoude opened this issue 3 years ago • 22 comments

Why ?

We should try to have latest versions of our libs to include easily their improvements (bugs, enhancements, etc.)

What ?

I tried to upgrade, but ran into an error:


jhipster/prettier-java/node_modules/chevrotain/lib/src/parse/parser/traits/recognizer_engine.js:184
            throw e;
            ^

TypeError: Cannot read property 'location' of undefined
    at JavaParser.TreeBuilder.cstPostNonTerminal (/Users/clementdessoude/Documents/dev/jhipster/prettier-java/node_modules/chevrotain/lib/src/parse/parser/traits/tree_builder.js:160:73)
    at JavaParser.cstPostNonTerminal (/Users/clementdessoude/Documents/dev/jhipster/prettier-java/packages/java-parser/src/parser.js:81:11)
    at JavaParser.RecognizerEngine.subruleInternal (/Users/clementdessoude/Documents/dev/jhipster/prettier-java/node_modules/chevrotain/lib/src/parse/parser/traits/recognizer_engine.js:423:18)
    at JavaParser.RecognizerApi.SUBRULE (/Users/clementdessoude/Documents/dev/jhipster/prettier-java/node_modules/chevrotain/lib/src/parse/parser/traits/recognizer_api.js:72:21)
    at JavaParser.<anonymous> (/Users/clementdessoude/Documents/dev/jhipster/prettier-java/packages/java-parser/src/productions/interfaces.js:241:7)
    at JavaParser.invokeRuleWithTry (/Users/clementdessoude/Documents/dev/jhipster/prettier-java/node_modules/chevrotain/lib/src/parse/parser/traits/recognizer_engine.js:117:33)
    at JavaParser.wrappedGrammarRule [as annotation] (/Users/clementdessoude/Documents/dev/jhipster/prettier-java/node_modules/chevrotain/lib/src/parse/parser/traits/recognizer_engine.js:131:38)
    at JavaParser.RecognizerEngine.doSingleRepetition (/Users/clementdessoude/Documents/dev/jhipster/prettier-java/node_modules/chevrotain/lib/src/parse/parser/traits/recognizer_engine.js:385:16)
    at JavaParser.RecognizerEngine.manyInternalLogic (/Users/clementdessoude/Documents/dev/jhipster/prettier-java/node_modules/chevrotain/lib/src/parse/parser/traits/recognizer_engine.js:318:29)
    at JavaParser.RecognizerEngine.manyInternal (/Users/clementdessoude/Documents/dev/jhipster/prettier-java/node_modules/chevrotain/lib/src/parse/parser/traits/recognizer_engine.js:295:21)

clementdessoude avatar Aug 30 '20 17:08 clementdessoude

@bd82 do you think you could have a look ? I have to admit that I am a bit lost on the origin of the issue.

Can we add a bounty on this @pascalgrimaud ?

clementdessoude avatar Jun 18 '21 20:06 clementdessoude

Starting with 200, which can be increased if it's a lot of work

pascalgrimaud avatar Jun 19 '21 06:06 pascalgrimaud

Chevrotain version 9.x is out

bd82 avatar Jun 19 '21 09:06 bd82

There was a conditional inside the cstPostNonTerminal method. It seems to have been removed in newer versions

  • https://github.com/Chevrotain/chevrotain/blob/v6.5.0/packages/chevrotain/src/parse/parser/traits/tree_builder.ts#L268-L272

Try putting the super call inside the conditional check in the override of the JavaParser

  • https://github.com/jhipster/prettier-java/blob/main/packages/java-parser/src/parser.js#L80-L81

bd82 avatar Jun 19 '21 09:06 bd82

Not sure a bounty is needed for this...

bd82 avatar Jun 19 '21 10:06 bd82

Thanks a lot @bd82, it looks indeed that your suggestion works !

Even if it was so much work, you deserve the bounty for all the great work you did on this lib, so if you want to take it, it's yours !!

clementdessoude avatar Jun 19 '21 11:06 clementdessoude

I don't have a paypal account :)

bd82 avatar Jun 19 '21 14:06 bd82

It seems that upgrading to chevrotain >=7 increase parser execution by 3 (for a file over 6000 lines) :/

image

clementdessoude avatar Jun 19 '21 15:06 clementdessoude

If it reduces the execution time, it is a good news, isn't it? Don't understand your smiley @clementdessoude

pascalgrimaud avatar Jun 19 '21 15:06 pascalgrimaud

If it reduces the execution time, it is a good news, isn't it? Don't understand your smiley @clementdessoude

Increase, sorry

clementdessoude avatar Jun 19 '21 15:06 clementdessoude

Oh in this case, it is not a good news indeed

pascalgrimaud avatar Jun 19 '21 15:06 pascalgrimaud

Is that for most test files or just for that single one?

bd82 avatar Jun 19 '21 15:06 bd82

Comparing the logs between these two pipeline, it seems a common behaviour:

For instance ConfigFileApplicationListener was parsed in 56ms in the first pipeline, and 140ms in the second one

It is approximately the same on my computer

clementdessoude avatar Jun 19 '21 15:06 clementdessoude

There are not supposed to be any performance regressions, (definitively none so large). The main performance hog with Java Parser is the backtracking done to keep the grammar strongly aligned with the specifications. Perhaps somehow the way that "smart" backtracking has been implemented its not as efficient as with newer versions of Chevrotain?

bd82 avatar Jun 19 '21 16:06 bd82

  • https://github.com/jhipster/prettier-java/blob/main/packages/java-parser/src/parser.js#L92-L112

Possible things that can go wrong:

  • Maybe this.outputCst = false; does not disable as many things as it did in 6.x (what caused the execption) so more work is being done while backtracking?
  • Maybe isRecognitionException() behaves differently and the block does not return?

I'd think the first one is more likely

bd82 avatar Jun 19 '21 16:06 bd82

Is the performance drop happening with the update to 7.x or to 9.x?

bd82 avatar Jun 19 '21 16:06 bd82

Is the performance drop happening with the update to 7.x or to 9.x?

It's in the update to 7.x.

clementdessoude avatar Jun 19 '21 17:06 clementdessoude

I will try to investigate it, but not sure when.

bd82 avatar Jun 20 '21 19:06 bd82

I've finally have some time to start looking into it.

I've had trouble with the local dev env because of windows + node-gyp + python.

What I'm doing now is:

  • running the benchmark on the folder below: image

  • But without the latest released version, only against the master version. (because I could not get niv to work).

    • note this it sometimes seems to me that benchmarking two versions of the similar library will give more consistent results if each library is run in a different process,.
  • and on this branch: https://github.com/clementdessoude/prettier-java/tree/chore/upgrade-chevrotain

  • and changing the chevrotain version in the package.json and executing yarn between benchmarks.

I can detect a performance regression switching from 6.5 to 7.0, but not x3 slower, rather 25-30% reduction. I am not sure why is that as with a quick look I can't spot any commit hat could have cause it between 6.5 and 7.0

  • https://github.com/chevrotain/chevrotain/compare/v6.5.0...v7.0.0

Perhaps it is some shenanigans inside of V8 engine's optimization...

Perhaps I can try npm linking chevrotain and doing a "binary" search to detect the "offensive" commit.

bd82 avatar Nov 04 '21 22:11 bd82

I think I was mistaken, the regression I noticed was likely between 6.5.0 and 7.12 (latest 7.x) As I can not reproduce a performance regression between 6.5.0 -> 7.0.0

But I can now repeatedly reproduce a performance regression between 7.1.0 -> 7.1.1

  • https://github.com/Chevrotain/chevrotain/compare/v7.1.0...v7.1.1

bd82 avatar Nov 04 '21 23:11 bd82

Okay this is the culprit: it was a refactor which made the exceptions use ES6 syntax and inheritance.

  • https://github.com/Chevrotain/chevrotain/commit/9045686f17ae2238fc709496224f6e52b41e158d

What is happening is that due to the "naive" backtracking nature of the parser (no memorization)

  • Backtracking because we wanted to stick as close to the spec as possible (at cost of performance).

A great many exceptions are thrown, when the parser tries and retries backtracking again and again. This seems to be quite extreme.

For samples under: `/packages/java-parser/samples/spring-framework/spring-expression/src/main/java/org/springframework/expression/spel/

Which are not that much code:

github.com/AlDanial/cloc v 1.90  T=0.20 s (579.8 files/s, 91927.9 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
Java                           118           2363           5101          11246
-------------------------------------------------------------------------------
SUM:                           118           2363           5101          11246
-------------------------------------------------------------------------------

The RecognitionException constructor gets called exactly: 263251 times. Which is approx 23.4 per "real" line of code.

So when the exception classes became more complex:

  1. class inheritance
  2. additional logic in the constructor

It negatively impacted the performance.

  • Note that I only saw 30-40% performance r, regressions, but it could depend on the specific set of files and the node engine used.

bd82 avatar Nov 05 '21 00:11 bd82

Possible Solutions

In Chevrotain revert the exceptions implementation to the old style

Problems:

  • Won't be available quickly, as Chevrotain's master branch currently contains breaking changes as work on a new major release is being done.
  • While I would like to simplify the exceptions (no real need for class inheritance), the scenario of throwing 24 exceptions per line is not exactly "common" so how much should it be optimized for on the library level?
  • Some of the logic added to the Exception constructor may have actually been needed to solve some issues with stack traces.

In java-parser implement a simple hack

Basically use something like https://github.com/ds300/patch-package to apply a patch on the exceptions_public.js file in chevrotain. I am not sure if the java-parser needs any information of the state of the exceptions other than the .message property, so a very simple alternative implementation could be provided.

  • Maybe even just replace with the same file.
  • alternatively if you can prove you only need the message and no other state, perhaps an optimized version could be "injected" which would provide a speed boost.

This should be easy to perform and not too fragile as the exceptions file does not often change.

In java-parser attempt to implement memoization on the backtracking logic

This is basically trying to improve the parsing backtracking algorithm. ~~It is quite likely the parser could be several times or even an order of magnitude faster if it was smarter on how it backtracks.~~

  • Edit: the above statement may be too "strong", lets instead say I suspect there is substantial room for improvements.

But this will require more research to prove and time investment to implement.

I will try to find time to open a separate issue with more details on this tomorrow

bd82 avatar Nov 05 '21 00:11 bd82

I suspect that using the LL(*) plugin for chevrotain could get rid of much of the backtracking used in the Java-Parser.

@msujew WDYT?

Example of backtracking production:

  • https://github.com/jhipster/prettier-java/blob/2f7efa41db2423a200236027afb9f0fe8c198b02/packages/java-parser/src/productions/blocks-and-statements.js#L568-L587

Usage of the backtracking production:

  • https://github.com/jhipster/prettier-java/blob/2f7efa41db2423a200236027afb9f0fe8c198b02/packages/java-parser/src/productions/blocks-and-statements.js#L27-L51

Backtracking wrapper logic

  • https://github.com/jhipster/prettier-java/blob/2f7efa41db2423a200236027afb9f0fe8c198b02/packages/java-parser/src/parser.js#L92-L112

bd82 avatar Jul 13 '23 20:07 bd82

@bd82 I've used a modified version of the parser used in this project for my thesis. I could probably contribute a PR which takes care of this.

Edit: Tried it, but just removing all backtracking isn't possible. The grammar is ambiguous in a lot of places and a lot of tests are failing on me.

msujew avatar Jul 13 '23 20:07 msujew

Thanks @msujew

Do you have any estimate or guess regarding what percentage of these lookaheads can be solved with LL* and what percentage cannot?

I am trying to understand if it may still be worth using the LL(*) plugin in this project.

bd82 avatar Jul 14 '23 16:07 bd82