Performance regression with graal-based image
Describe the bug When using graal docker image introduced in https://github.com/erdos/stencil/pull/53, .docx templates are rendered slower.
To Reproduce
-
download & extract a .docx template file into a
/tmp/templatesdirectory: lipsum_sample.docx.zip -
Run two different containers, side by side (in separate Terminal windows):
docker run --rm --name stencil-0.3.10 --publish 5000:8080 --volume /tmp/templates:/templates ghcr.io/erdos/stencil:0.3.10 docker run --rm --name stencil-0.3.10-graal --publish 5001:8080 --volume /tmp/templates:/templates ghcr.io/erdos/stencil:0.3.10-graal -
Run the tests:
-
for regular container:
$ bash -c "for test in 1 2 3 4 5; do time curl -s localhost:5000/lipsum_sample.docx --header 'content-type: application/json' --data '{}' > /tmp/lipsum_sample_regular.docx; done" real 0m10.385s user 0m0.006s sys 0m0.010s real 0m4.191s user 0m0.005s sys 0m0.008s real 0m4.245s user 0m0.005s sys 0m0.008s real 0m4.239s user 0m0.005s sys 0m0.008s real 0m4.263s user 0m0.004s sys 0m0.008s $ -
for graal container:
$ bash -c "for test in 1 2 3 4 5; do time curl -s localhost:5001/lipsum_sample.docx --header 'content-type: application/json' --data '{}' > /tmp/lipsum_sample_graal.docx; done" real 0m19.201s user 0m0.005s sys 0m0.008s real 0m12.841s user 0m0.004s sys 0m0.010s real 0m13.111s user 0m0.004s sys 0m0.007s real 0m12.936s user 0m0.004s sys 0m0.008s real 0m13.117s user 0m0.004s sys 0m0.007s $
-
-
Notice how graal-based image is consistently slower.
-
Stop the containers:
docker kill stencil-0.3.10 docker kill stencil-0.3.10-graal
Expected behavior Performance of application in a graal-based image is on par with that of a regular one, or faster.
However, I know very little about GraalVM, as well as the goals to build a graalvm-based image. Perhaps, performance was not among the goals for this image.
Screenshots No screenshots.
Documents I've attached a link to the document above. It is a 800+ pages .docx file with no template placeholders in it.
Office versions: Not applicable as far as I can see.
Environment where template is rendered:
- Stencil version: 0.3.10
- Java version: I used regular and graal stencil images.
- Operating System version: 10.14.6 (18G103)
- Docker version: 19.03.13, build 4484c46d9d
Additional context No additional context.
Hello @gmile , thank you for opening the PR 4 years ago :-)
The GraalVM image was never really production-ready, but rather a field for experimentation and a test case to make sure the library can be used in graal services. Therefore, the performance was not really optimized for that. Also, I just realized that the graal container broke in #140 2 years ago and I just patched it in the last commit :-(
However, now I had the chance to pick this issue up and spend some time fixing the service. The current measurements with your approach are the following with th 0.6.1-SNAPSHOT version:
With JVM service:
real 0m1.276s
user 0m0.004s
sys 0m0.007s
real 0m0.008s
user 0m0.002s
sys 0m0.003s
real 0m0.007s
user 0m0.002s
sys 0m0.003s
real 0m0.007s
user 0m0.003s
sys 0m0.001s
real 0m0.006s
user 0m0.001s
sys 0m0.002s
With GraalVM service:
real 0m3.163s
user 0m0.004s
sys 0m0.008s
real 0m0.169s
user 0m0.002s
sys 0m0.005s
real 0m0.150s
user 0m0.003s
sys 0m0.001s
real 0m0.169s
user 0m0.002s
sys 0m0.003s
real 0m0.152s
user 0m0.002s
sys 0m0.004s
So it seems like the issue still persists (0.007s vs 0.150s), even though the performance is much better today than it used to be 4 years ago. In the next days, I will check some options:
- With more recent GraalVM versions
- With GraalVM Enterprise Edition instead of using the Community Edition
- With Profile-Guided Optimizations
With latest GraalVM EE:
real 0m0.159s
user 0m0.005s
sys 0m0.006s
real 0m0.161s
user 0m0.003s
sys 0m0.005s
real 0m0.139s
user 0m0.004s
sys 0m0.002s
real 0m0.134s
user 0m0.002s
sys 0m0.005s
real 0m0.132s
user 0m0.002s
sys 0m0.003s
No significant improvement so far.
I tested PGO with some profiling today and managed to get the runtime down to cca 0.090s in the native containers:
real 0m0.092s
user 0m0.001s
sys 0m0.003s
real 0m0.084s
user 0m0.001s
sys 0m0.003s
real 0m0.080s
user 0m0.001s
sys 0m0.004s
real 0m0.084s
user 0m0.003s
sys 0m0.003s