ast icon indicating copy to clipboard operation
ast copied to clipboard

Parsing large files is too slow

Open drieks opened this issue 5 years ago • 7 comments

These files are currently not included in SelfTest.kt because the processing does not finish within a reasonable time:

  • KotlinLexer.kt
  • KotlinParser.kt
  • UnicodeClasses.kt

drieks avatar Oct 21 '19 21:10 drieks

Hi @martinflorek,

please try version fd6123da02. Can you tell me the required parsing time of the old and the new version? Thank you very much!

drieks avatar Oct 27 '19 22:10 drieks

I am not able to properly measure the parsing time only, because I process several source code repositories at once and I am looking for specific files only before parsing them.

But the new version runs a bit faster. All my processing went from ~33 seconds to ~32 seconds. Version with Kastree runs in 1.7 seconds.

martinflorek avatar Oct 28 '19 07:10 martinflorek

I refactored kotlinx.ast so that it is now possible to use both antlr-kotlin and antlr-java to parse kotlin sources. Example: https://github.com/kotlinx/ast/blob/master/grammar-kotlin-parser-antlr-java/src/test/kotlin/kotlinx/ast/example/ExampleMain.kt

But sadly, it seems that antlr-kotlin is not much slower than antlr-java. I will try to figure out how to speed up parsing.

drieks avatar Oct 30 '19 22:10 drieks

@ShikaSD pointed me to antlr-optimized, so I implemented support for this antlr fork in kotlinx.ast. But sadly, it is not as fast as hoped. I will try to implement a lexer and parser using antlr4 grammar files, only supporting the features that are required to parse kotlin files. I already added support to parse antlr4 grammar files for this use case in kotlinx.ast:grammar-antlr4-parser-antlr-java.

drieks avatar Nov 06 '19 23:11 drieks

The time for ./gradlew clean check was reduced from 3min 30s in commit c7dd6bbd5419789a7feba0d68cf6f1f326197103 to 2min 30s in commit f088b3cf8de0817e9f235c0b53e0923127956b22.

because of this, all kotlin files will now be scanned in the self test.

it is still required to speed this up, I think we need some patch to the kotlin parser/lexer for this.

drieks avatar Dec 07 '20 22:12 drieks

build time for commit 95db180495bb46afe42767191996e3cd49cd96cf is 44s, so we can assume that testing the previusly excluded files takes around 1 minute 45s.

  • KotlinLexer.kt
  • KotlinParser.kt
  • UnicodeClasses.kt

drieks avatar Dec 07 '20 22:12 drieks

Can you have a look at my comment in #50 ? Why is a large garbage-string faster than a large string containing json?

fab1an avatar Dec 15 '21 07:12 fab1an