contribution icon indicating copy to clipboard operation
contribution copied to clipboard

patch-diff-report-tool: add multi-thread support to CheckstyleReportsParser

Open rnveach opened this issue 6 years ago • 5 comments

Execution:

java -jar /home/ricky/opensource/contribution/patch-diff-report-tool/target/patch-diff-report-tool-0.1-SNAPSHOT-jar-with-dependencies.jar --baseReport /home/ricky/regression_reports/savemaster/openjdk10/results.xml --patchReport /home/ricky/regression_reports/savepull/openjdk10/results.xml --output /home/ricky/regression_reports/diff/openjdk10 --baseConfig /var/www/html/reports/32107f44546e135a34db874c19e7cfe4 --patchConfig /var/www/html/reports/32107f44546e135a34db874c19e7cfe4 --refFiles /home/ricky/regression_repositories/openjdk10
patch-diff-report-tool execution started.
XML parsing is started.

Config:

<?xml version="1.0"?>
<!DOCTYPE module PUBLIC
          "-//Puppy Crawl//DTD Check Configuration 1.3//EN"
          "http://www.puppycrawl.com/dtds/configuration_1_3.dtd">

<module name="Checker">
    <property name="charset" value="UTF-8"/>
    <property name="severity" value="warning"/>
    <property name="haltOnException" value="false"/>

    <module name="TreeWalker">

<module name="HiddenField">
  <property name="id" value="HiddenField1" />
</module>
<module name="HiddenField">
  <property name="id" value="HiddenField2" />
  <property name="ignoreConstructorParameter" value="true"/>
</module>
<module name="HiddenField">
  <property name="id" value="HiddenField3" />
  <property name="ignoreSetter" value="true"/>
</module>
<module name="HiddenField">
  <property name="id" value="HiddenField4" />
  <property name="setterCanReturnItsClass" value="true"/>
</module>

<module name="MagicNumber">
  <property name="id" value="MagicNumber1" />
</module>
<module name="MagicNumber">
  <property name="id" value="MagicNumber2" />
  <property name="ignoreHashCodeMethod" value="true" />
</module>
<module name="MagicNumber">
  <property name="id" value="MagicNumber3" />
  <property name="ignoreAnnotation" value="true" />
</module>
<module name="MagicNumber">
  <property name="id" value="MagicNumber4" />
  <property name="ignoreFieldDeclaration" value="true" />
</module>

<module name="ModifiedControlVariable">
  <property name="id" value="ModifiedControlVariable1" />
</module>
<module name="ModifiedControlVariable">
  <property name="id" value="ModifiedControlVariable2" />
  <property name="skipEnhancedForLoopVariable" value="true" />
</module>

<module name="MultipleVariableDeclarations"/>

    </module>
</module>

Both report XML files are 169 and 171 megs.

So far I have been waiting 45 minutes for the utility to finish. Based on the displays it has only been working on CheckstyleReportsParser.parse this entire time. I have 4 cores (8 threads) which are sitting idle while I am waiting for this 1 thread to finish basically all the work.

I believe we could multi-thread the following section: https://github.com/checkstyle/contribution/blob/f4249ac0af8e19a875f450dc7a8fffc7df5405a3/patch-diff-report-tool/src/main/java/com/github/checkstyle/parser/CheckstyleReportsParser.java#L131-L134

Maybe process each file separately with its own thread and memory and then combine the 2 after it is all loaded into memory.

Edit: It finally finished after a bit more than 1 hour.

rnveach avatar Jan 13 '19 15:01 rnveach

We could split these into 2 threads, but this at best this will only cut the amount of time needed by half. At best is half because we are doubling the number of threads, but we still need to merge the 2 back into 1 so we probably never reach perfection of half the time. To split into more threads would require splitting up the XML into halves/quarters/etc. but I'm not sure how we can do that quickly without damaging the XML structure unless we force guarantees on how the XML file is structured.

rnveach avatar Jan 13 '19 16:01 rnveach

Starting to test out new change. Will update post as going on. See https://github.com/rnveach/contribution/commit/10fc72a60ac0eb8279667fb21deaa9561d85cf33

Old run:

www-data@ubuntu17:/tmp/test1$ time java -jar /home/ricky/opensource/contribution/patch-diff-report-tool/target/patch-diff-report-tool-0.1-SNAPSHOT-jar-with-dependencies.jar --baseReport /home/ricky/regression_reports/savemaster/openjdk10/results.xml --patchReport /home/ricky/regression_reports/savepull/openjdk10/results.xml --output /tmp/test1 --baseConfig /var/www/html/reports/32107f44546e135a34db874c19e7cfe4 --patchConfig /var/www/html/reports/32107f44546e135a34db874c19e7cfe4 --refFiles /home/ricky/regression_repositories/openjdk10
patch-diff-report-tool execution started.
XML parsing is started.
Creation of configuration report is started.
Creation of diff html site is started.
Creation of the result site succeed.
patch-diff-report-tool execution finished.

real	60m40.815s
user	60m53.476s
sys	0m2.844s

New run with XML reading multi-threaded:

www-data@ubuntu17:/tmp/test1$ time java -jar /home/ricky/opensource/contribution/patch-diff-report-tool/target/patch-diff-report-tool-0.1-SNAPSHOT-jar-with-dependencies.jar --baseReport /home/ricky/regression_reports/savemaster/openjdk10/results.xml --patchReport /home/ricky/regression_reports/savepull/openjdk10/results.xml --output /tmp/test1 --baseConfig /var/www/html/reports/32107f44546e135a34db874c19e7cfe4 --patchConfig /var/www/html/reports/32107f44546e135a34db874c19e7cfe4 --refFiles /home/ricky/regression_repositories/openjdk10
patch-diff-report-tool execution started.
XML parsing is started.
Start Parse. 2019.01.13 14.32.19.786
Parsing /home/ricky/regression_reports/savemaster/openjdk10/results.xml. 2019.01.13 14.32.19.791
Parsing /home/ricky/regression_reports/savepull/openjdk10/results.xml. 2019.01.13 14.32.19.791
Done Parsing /home/ricky/regression_reports/savepull/openjdk10/results.xml. 2019.01.13 14.32.29.465
Done Parsing /home/ricky/regression_reports/savemaster/openjdk10/results.xml. 2019.01.13 14.32.29.477
Merging XML results. 2019.01.13 14.32.29.477
Generate Diff Statistics.2019.01.13 15.33.43.012
End Parse. 2019.01.13 15.33.43.020
Creation of configuration report is started.
Creation of diff html site is started.
Creation of the result site succeed.
patch-diff-report-tool execution finished.

real	61m34.888s
user	61m59.280s
sys	0m2.752s

rnveach avatar Jan 13 '19 19:01 rnveach

I can already tell the change won't have much impact. Parsing the XML files finishes pretty fast (~10 seconds):

Start Parse. 2019.01.13 14.32.19.786
Parsing /home/ricky/regression_reports/savemaster/openjdk10/results.xml. 2019.01.13 14.32.19.791
Parsing /home/ricky/regression_reports/savepull/openjdk10/results.xml. 2019.01.13 14.32.19.791
Done Parsing /home/ricky/regression_reports/savepull/openjdk10/results.xml. 2019.01.13 14.32.29.465
Done Parsing /home/ricky/regression_reports/savemaster/openjdk10/results.xml. 2019.01.13 14.32.29.477

It is the merging that it is now stuck on. I will have to look at the method more closely, but the next obvious area of slowdown would be DiffReport.addRecords when it has to produce the differences.

Edit: Yes it is the diff that is the slowdown. I can see it stalling on single file diffs like /home/ricky/regression_repositories/openjdk10/test/java/lang/instrument/BigClass.java.

Edit 2: Latest change is definately seeing a performance increase. See https://github.com/rnveach/contribution/commit/51636a12a0c8b02b03ca378fded837e8983f4cbf

www-data@ubuntu17:/tmp/test1$ time java -jar /home/ricky/opensource/contribution/patch-diff-report-tool/target/patch-diff-report-tool-0.1-SNAPSHOT-jar-with-dependencies.jar --baseReport /home/ricky/regression_reports/savemaster/openjdk10/results.xml --patchReport /home/ricky/regression_reports/savepull/openjdk10/results.xml --output /tmp/test1 --baseConfig /var/www/html/reports/32107f44546e135a34db874c19e7cfe4 --patchConfig /var/www/html/reports/32107f44546e135a34db874c19e7cfe4 --refFiles /home/ricky/regression_repositories/openjdk10
patch-diff-report-tool execution started.
XML parsing is started.
Start Parse. 2019.01.13 16.12.46.652
Parsing /home/ricky/regression_reports/savemaster/openjdk10/results.xml. 2019.01.13 16.12.46.657
Parsing /home/ricky/regression_reports/savepull/openjdk10/results.xml. 2019.01.13 16.12.46.657
Done Parsing /home/ricky/regression_reports/savepull/openjdk10/results.xml. 2019.01.13 16.12.56.367
Done Parsing /home/ricky/regression_reports/savemaster/openjdk10/results.xml. 2019.01.13 16.12.56.432
Merging XML results. 2019.01.13 16.12.56.433
Generate Diff Statistics.2019.01.13 16.51.02.379
End Parse. 2019.01.13 16.51.02.397
Creation of configuration report is started.
Creation of diff html site is started.
Creation of the result site succeed.
patch-diff-report-tool execution finished.

real	39m15.898s
user	76m37.736s
sys	0m2.956s

rnveach avatar Jan 13 '19 19:01 rnveach

I got it down to 20 minutes with the latest changes (originally 1 hour) using all 4 threads. See https://github.com/rnveach/contribution/commit/a5455f2887b95523e6aa3f885a3529e97dae3cf8

www-data@ubuntu17:/tmp/test1$ time java -jar /home/ricky/opensource/contribution/patch-diff-report-tool/target/patch-diff-report-tool-0.1-SNAPSHOT-jar-with-dependencies.jar --baseReport /home/ricky/regression_reports/savemaster/openjdk10/results.xml --patchReport /home/ricky/regression_reports/savepull/openjdk10/results.xml --output /tmp/test1 --baseConfig /var/www/html/reports/32107f44546e135a34db874c19e7cfe4 --patchConfig /var/www/html/reports/32107f44546e135a34db874c19e7cfe4 --refFiles /home/ricky/regression_repositories/openjdk10
patch-diff-report-tool execution started.
XML parsing is started.
Start Parse. 2019.01.13 20.36.02.781
Parsing /home/ricky/regression_reports/savepull/openjdk10/results.xml. 2019.01.13 20.36.02.786
Parsing /home/ricky/regression_reports/savemaster/openjdk10/results.xml. 2019.01.13 20.36.02.786
Done Parsing /home/ricky/regression_reports/savemaster/openjdk10/results.xml. 2019.01.13 20.36.12.516
Done Parsing /home/ricky/regression_reports/savepull/openjdk10/results.xml. 2019.01.13 20.36.12.603
Merging XML results. 2019.01.13 20.36.12.603
Generate Diff Statistics.2019.01.13 20.56.36.905
End Parse. 2019.01.13 20.56.36.913
Creation of configuration report is started.
Creation of diff html site is started.
Creation of the result site succeed.
patch-diff-report-tool execution finished.
^C
real	22m11.856s
user	81m50.440s
sys	0m9.072s

For whatever reason time doesn't end when the utility ends so I had to cancel it with Ctrl + C.

diff utility shows the output is the same between the 2 runs.

rnveach avatar Jan 14 '19 02:01 rnveach

For whatever reason time doesn't end when the utility ends so I had to cancel it with Ctrl + C.

The ExecutorService has to be shutdown manually otherwise the threads continue to live in the background and hold up the process from exiting. A regression test run wouldn't finish without doing this.

@romani I will continue testing with regression to make sure I don't see anything else.

rnveach avatar Jan 14 '19 14:01 rnveach