opal icon indicating copy to clipboard operation
opal copied to clipboard

Performance of eager computations in Project initialization

Open johannesduesing opened this issue 5 months ago • 3 comments

Problem Statement

As discussed in our recent OPAL meeting, we want to understand what operations are performed (eagerly) when initializing a Project instance, and their respective impact on the overall performance. I had a first look and identified the following relevant operations:

  • O1 Building the class hierarchy: This is done in a separat future using the Scala global execution context
  • O2 Process project class files: Processes every project class file adding it to the relevant data structures and updating things like code sizes and count variable. This includes virtual class files supplied by the caller. Also processes modules and nesting information and prints inconsistent project warnings.
  • O3 Process library class files: Same thing as above, but for the library class files.
  • O4 Compute instance methods: This is done in a separat future as well
  • O5 Compute overriding methods: This is done in a separat future as well
  • O6 Validate the project instance: Checks for some fundamental issues with project consistency
  • O7 Compute classes-per-package map. This is a val definition and happens on Project instantiation
  • O8 Computing functional interfaces: This is a lazy val, so not really relevant in this context. However, it already features the following annotation:
    // TODO Consider extracting to a ProjectInformationKey
    final lazy val functionalInterfaces: UIDSet[ObjectType] = [..]

O1 runs concurrently to O2 & O3 and is waited for after O3 completes. O4 and O5 run concurrently while the main thread performs some array manipulations, both are waited for when the actual project instance is created - this is when O7 is triggered. O6 runs after the instantiation has completed, then the Project instance is returned.

Empirical Evaluation

I implemented a small patch to OPAL that extracts the runtime of the operations mentioned above. Based on that i wrote an analysis that iterates Maven Central and does the following:

  1. Locate project JAR based on GAV and open a stream for download
  2. Download project JAR and parse it to OPAL ClassFile representation
  3. Download all transitive dependency JARs and parse them to OPAL ClassFile representation (interfaces only)
  4. Initialize a Project instance based on those project- and library class files
  5. Extract performance values for the operations mentioned above
  6. Write the following values into a CSV file: GAV, #ProjectClasses, #Libraries, #LibraryClasses, StreamTime, LoadAndParseProjectCFsTime, LoadAndParseLibraryCFsTime, TotalProjectInitTime, O4Time, O1Time, O5Time, O7Time, O2Time, O3Time, O6Time

A first very basic run on ~1000 GAVs produced the following results: stats.csv. Note that all times are in milliseconds and the LoadAndParse[Project|Library]CFsTime depends on my local internet connection at home.

Let me know if you have any ideas or additional input for me, then i'll run the analysis on our servers and post evaluation results under this issue.

johannesduesing avatar Sep 17 '24 12:09 johannesduesing