X-Road
X-Road copied to clipboard
As a Developer I want to analyse the Security Server proxy performance to find bottlenecks in the current code
We should investigate the current Security Server proxy implementation to see if there are any bottlenecks in the messaging that could be improved.
A suitable setup for the investigation is a client/test runner (e.g., Apache bench tool "ab"), 2 * Security Servers and a mock service:
Client/test runner → SS1 (ClientProxy) → SS2 (ServerProxy) → Mock service
The JIRA issue this was created from can be found here: https://nordic-institute.atlassian.net/browse/XRDDEV-1568
Acceptance criteria:
- [ ] Performance testing is done on JAVA 11
- [ ] The proxy messaging paths (client and server proxies) are analysed and potential bottlenecks documented
- [ ] For example, the results are visualized using Flame Graphs
- [ ] Tools and configurations used for testing are documented
- [ ] Suggestions to improve proxy performance are documented
Using niis/xroad-security-server-standalone:bionic-7.0.2
Main performance bottleneck lays in https://github.com/nordic-institute/X-Road/blob/develop/src/proxy/src/main/java/ee/ria/xroad/proxy/protocol/ProxyMessage.java
There's an unused variable that was only used during REST POC implementation
public static final int REST_BODY_LIMIT = 8192; //store up to limit bytes into memory
Currently every REST message body is dumped to disk using non-optimal way
https://github.com/nordic-institute/X-Road/blob/develop/src/common/common-util/src/main/java/ee/ria/xroad/common/util/CachingStream.java#L59C39
java.nio uses by default 8k buffer size
this results in "impressive" IO operations for one request with 10MB message body:
root@ss:/# inotifywait -m -r /var/tmp/xroad/ > inotify.out
Setting up watches. Beware: since -r was given, this may take a while!
Watches established.
^C
root@ss:/# cat inotify.out | sort | uniq -c | sort -rn
3012 /var/tmp/xroad/ MODIFY tmpattach18211735386472262976.tmp
1348 /var/tmp/xroad/ MODIFY tmpattach5517139367076729444.tmp
861 /var/tmp/xroad/ ACCESS tmpattach18211735386472262976.tmp
2 /var/tmp/xroad/ OPEN tmpattach5517139367076729444.tmp
2 /var/tmp/xroad/ OPEN tmpattach18211735386472262976.tmp
2 /var/tmp/xroad/ CLOSE_WRITE,CLOSE tmpattach5517139367076729444.tmp
2 /var/tmp/xroad/ CLOSE_WRITE,CLOSE tmpattach18211735386472262976.tmp
1 /var/tmp/xroad/ DELETE tmpattach5517139367076729444.tmp
1 /var/tmp/xroad/ DELETE tmpattach18211735386472262976.tmp
1 /var/tmp/xroad/ CREATE tmpattach5517139367076729444.tmp
1 /var/tmp/xroad/ CREATE tmpattach18211735386472262976.tmp
1k mesage body
ab -c 10 -t 60 -H 'X-Road-Client: CS/ORG/1111/TestClient' \
http://host.docker.internal:8080/r1/CS/ORG/1111/TestService/perftest/1k.json
This is ApacheBench, Version 2.3 <$Revision: 1879490 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking host.docker.internal (be patient)
Finished 3284 requests
Server Software:
Server Hostname: host.docker.internal
Server Port: 8080
Document Path: /r1/CS/ORG/1111/TestService/perftest/1k.json
Document Length: 1024 bytes
Concurrency Level: 10
Time taken for tests: 60.026 seconds
Complete requests: 3284
Failed requests: 0
Total transferred: 4889876 bytes
HTML transferred: 3362816 bytes
Requests per second: 54.71 [#/sec] (mean)
Time per request: 182.782 [ms] (mean)
Time per request: 18.278 [ms] (mean, across all concurrent requests)
Transfer rate: 79.55 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 1 4 2.7 3 21
Processing: 31 81 32.9 90 235
Waiting: 30 80 32.6 90 234
Total: 32 85 34.6 95 239
Percentage of the requests served within a certain time (ms)
50% 95
66% 104
75% 109
80% 112
90% 121
95% 132
98% 143
99% 154
100% 239 (longest request)
iostat -d -k 1 60 sdc | awk 'BEGIN {count = 0; r_sum = 0; w_sum = 0} /sdc/ {count++; r_sum += $3; w_sum += $4} END {printf "Average Read IOPS: %.2f\nAverage Write IOPS: %.2f\n", r_sum/count, w_sum/count}'
Average Read IOPS: 2.20
Average Write IOPS: 539.06
10MB message body
2 concurrent connections was the maximum that testing machine could handle without timeouts (AMD Ryzed 7 PRO 4750U, cheap M.2 SSD, Docker Desktop on Windows )
ab -s 60 -c 2 -t 60 -H 'X-Road-Client: CS/ORG/1111/TestClient' \
http://host.docker.internal:8080/r1/CS/ORG/1111/TestService/perftest/10M.json
This is ApacheBench, Version 2.3 <$Revision: 1879490 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking host.docker.internal (be patient)
Finished 42 requests
Server Software:
Server Hostname: host.docker.internal
Server Port: 8080
Document Path: /r1/CS/ORG/1111/TestService/perftest/10M.json
Document Length: 10485760 bytes
Concurrency Level: 2
Time taken for tests: 83.779 seconds
Complete requests: 42
Failed requests: 1
(Connect: 0, Receive: 0, Length: 1, Exceptions: 0)
Total transferred: 429934323 bytes
HTML transferred: 429916160 bytes
Requests per second: 0.50 [#/sec] (mean)
Time per request: 3989.489 [ms] (mean)
Time per request: 1994.744 [ms] (mean, across all concurrent requests)
Transfer rate: 5011.48 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 1 2 0.6 2 3
Processing: 701 2601 7179.6 1618 47884
Waiting: 0 812 468.1 844 3345
Total: 702 2602 7179.6 1620 47885
Percentage of the requests served within a certain time (ms)
50% 1620
66% 1650
75% 1678
80% 1693
90% 1763
95% 1826
98% 47885
99% 47885
100% 47885 (longest request)
root@ss:/# iostat -d -k 1 60 sdc | awk 'BEGIN {count = 0; r_sum = 0; w_sum = 0} /sdc/ {count++; r_sum += $3; w_sum += $4} END {printf "Average Read IOPS: %.2f\nAverage Write IOPS: %.2f\n", r_sum/count, w_sum/count}'
Average Read IOPS: 2.01
Average Write IOPS: 14064.13
I would propose to implement a configurable buffer size for the attachment storage process.
This would allow to tune the performance for different use cases depending on average message sizes.
Also for small messages it would be nice to have an option to store attachments in memory instead of disk.
Hello @zpotoloom!
Thank you for looking into this issue and proposing a solution. Your suggestion to make the buffer size configurable so that users can tune it based on their needs makes sense, as well as bypassing the disk altogether for smaller messages.
Unfortunately, we will not be able to introduce this change for version 7.4.0 yet, but we will look into implementing the suggestion for version 7.5.0.