keycloak-benchmark icon indicating copy to clipboard operation
keycloak-benchmark copied to clipboard

compare A/A performance results with A/P tests

Open kami619 opened this issue 1 year ago • 9 comments

kami619 avatar Jul 24 '24 14:07 kami619

https://github.com/keycloak/keycloak-benchmark/compare/main...kami619:keycloak-benchmark:is-901-analyze-perf-results

Adding a new Python script to analyze the data across deployment types easily from your local or we can hook this to a post-scheduled run action to print the data into a summary json.

Sample out when querying recent results in the result_data branch.

rosa_scalability/2024/07/15

> python3 perfInsights.py 
start,context.externalInfinispanFeatureEnabled,context.persistentSessionsEnabled,cpuUsageForLoginsTest.userLoginsPerSecPer1vCpuPerPod,memoryUsageTest.activeSessionsPer500MbPerPod
2024-07-15T06:08:32+00:00,False,False,46,36567
2024-07-15T07:05:40+00:00,False,True,52,37247
2024-07-15T08:17:35+00:00,True,False,54,45528
2024-07-15T09:20:37+00:00,False,True,2248,250059
2024-07-15T10:27:36+00:00,True,False,54,48689

rosa_scalability/2024/07/23

start,context.externalInfinispanFeatureEnabled,context.persistentSessionsEnabled,cpuUsageForLoginsTest.userLoginsPerSecPer1vCpuPerPod,memoryUsageTest.activeSessionsPer500MbPerPod
2024-07-23T11:49:29+00:00,False,False,48,39400
2024-07-23T13:28:35+00:00,False,True,50,34719
2024-07-23T14:35:36+00:00,True,False,58,34601
2024-07-23T15:39:33+00:00,False,True,49,30649
2024-07-23T17:57:29+00:00,True,False,13112,1004912

rosa_scalability/2024/07/24

start,context.externalInfinispanFeatureEnabled,context.persistentSessionsEnabled,cpuUsageForLoginsTest.userLoginsPerSecPer1vCpuPerPod,memoryUsageTest.activeSessionsPer500MbPerPod
2024-07-24T06:14:27+00:00,False,False,44,29559
2024-07-24T07:14:27+00:00,False,True,48,40980
2024-07-24T08:22:34+00:00,True,False,51,39983
2024-07-24T09:28:43+00:00,False,True,46,29033
2024-07-24T10:40:40+00:00,True,False,108,59044

Currently, the CPU and memory performance looks comparable in A/P and A/A tests. I shall continue the investigation across other endpoints and look to extend the perfInsights.py script.

Note: there were a few anomalies where we got outlier metrics within the runs in the reported data, we can repeat some of these runs in the next few days and see if a pattern emerges.

kami619 avatar Jul 24 '24 14:07 kami619

@kami619 - can you please have a look at the response times as well?

ahus1 avatar Jul 25 '24 08:07 ahus1

@ahus1 Yup, I will look at endpoint response times and the throughput across the tests for different deployment types. I will share the results in this thread.

kami619 avatar Jul 25 '24 12:07 kami619

DISCLAIMER: The data from the nightly runs for A/A with persistent sessions has results where the embedded caches were still in use, even though the below results might be invalid to a certain extent, they should indicate the performance trend. I will update the results once I have the fix in the deployment in place.

memoryUsageTest

  • activeSessionsPer500MbPerPod this metric has its lowest performing value of in an A/A setup with 'externalInfinispanFeatureEnabled': False, 'persistentSessionsEnabled': True.
  • Dataset from one of the good runs.
externalInfinispanFeatureEnabled,persistentSessionsEnabled,activeSessionsPer500MbPerPod
false,false,29559
false,true,40980
true,false,39983
false,true,29033
true,false,59044
  • Browser posts correct credentials 50% response times increase by 53% when persistentSessionsEnabled is set to true within the A/A deployment type.

Raw Data:

externalInfinispanFeatureEnabled,persistentSessionsEnabled,memoryUsageTest.percentiles1.total
false,false,108
false,true,113
true,false,88
false,true,123
true,false,80

cpuUsageForLoginsTest

  • Response times have a wider swing between deployment types in this test.
  • Browser logout 50% response times increase by 230% when persistentSessionsEnabled is set to true within the A/A deployment type.
  • Browser posts correct credentials 50% response times increase by 64% when persistentSessionsEnabled is set to true within the A/A deployment type.
  • userLoginsPerSecPer1vCpuPerPod remains same with a variance of 9% between the deployments, with its best value observed when externalInfinispanFeatureEnabled is set to true.
  • Endpoint throughput remains the same across the deployment types.

Raw Endpoint data: Browser logout

externalInfinispanFeatureEnabled,persistentSessionsEnabled,cpuUsageForLoginsTest.percentiles1.total
false,false,26
false,true,42
true,false,12
false,true,58
true,false,16

Browser posts correct credentials

externalInfinispanFeatureEnabled,persistentSessionsEnabled,cpuUsageForLoginsTest.percentiles1.total
false,false,100
false,true,110
true,false,81
false,true,120
true,false,73

cpuUsageForCredentialGrantsTest

  • Irrespective of the HA type (A/A or A/P) when externalInfinispanFeatureEnabled is set to true, we saw 100% increase in the Client credentials grant type endpoint's 50% response time compared with when externalInfinispanFeatureEnabled is set to false in the deployment context.

Raw Data:

externalInfinispanFeatureEnabled,persistentSessionsEnabled,cpuUsageForCredentialGrantsTest.percentiles1.total
false,false,9
false,true,9
true,false,20
false,true,12
true,false,22
  • Endpoint throughput remains the same across the deployment types.
  • credentialGrantsPerSecPer1vCpu remains the same across the deployment types, with a 5% variance between the deployments.

kami619 avatar Jul 26 '24 10:07 kami619

@kami619 - those percentages look scary, still if they occur on a low initial response time, they might be considered ok. Can you please update the previous comment to include the absolute time in milliseconds to capture this perspective? Thanks!

ahus1 avatar Jul 26 '24 11:07 ahus1

Great insights from those performance tests! I will look into the client credential grant increased response times later today as those times shouldn't increase as it shouldn't use persistent user sessions by default. There seems to be a potential for an optimization.

ahus1 avatar Jul 26 '24 11:07 ahus1

rosa_scalability_runs_comparision.zip Here is the data dump which I used for the analysis.

@ahus1 I have added some snapshots of the absolute values in ms for the response times of the specific endpoints.

kami619 avatar Jul 26 '24 11:07 kami619

@ahus1 I made a mistake in the cpuUsageForCredentialGrantsTest analysis when I said the endpoint response times spiked when persistentSessionsEnabled is set to true, that was not a correct observation. I have updated the statement to reflect the actual data.

kami619 avatar Jul 26 '24 15:07 kami619

PRs to optimize performance:

  • https://github.com/keycloak/keycloak/pull/31700
  • https://github.com/keycloak/keycloak/pull/31702

ahus1 avatar Jul 29 '24 08:07 ahus1

@ahus1 Here are the re-run results comparing A/A and A/P deployments of Keycloak in Multi-site.

User Logins Per Second Per 1vCPU Per Pod

Scenario External Infinispan Persistent Sessions Logins per Sec Delta (%)
A/A True False 53 Baseline
A/A True True 49 -7.55%
A/P False False 46 -13.21%
A/P False True 49 -7.55%
A/P True False 52 -1.89%

Observations:

  • A/A Scenario: Enabling persistent sessions leads to a 7.55% decrease in login rate.
  • A/P Scenario:
    • Without external Infinispan, persistent sessions increase the login rate by 6.52%.
    • Enabling external Infinispan results in a 13.04% increase compared to the A/P baseline without persistent sessions.

Response Time Percentiles (50th Percentile)

1. Browser Posts Correct Credentials

Scenario External Infinispan Persistent Sessions 1st Percentile Response Time (ms) Delta (%)
A/A True False 74 Baseline
A/A True True 109 +47.30%
A/P False False 89 +20.27%
A/P False True 114 +54.05%
A/P True False 74 0.00%

Observations:

  • A/A Scenario: Enabling persistent sessions results in a 47.30% increase in response time.
  • A/P Scenario:
    • Without external Infinispan and persistent sessions, the response time increases by 20.27%.
    • Enabling both features results in the same performance as the A/A baseline.

2. Browser Logout

Scenario External Infinispan Persistent Sessions 1st Percentile Response Time (ms) Delta (%)
A/A True False 14 Baseline
A/A True True 22 +57.14%
A/P False False 27 +92.86%
A/P False True 32 +128.57%
A/P True False 13 -7.14%

Observations:

  • A/A Scenario: Enabling persistent sessions increases response time by 57.14%.
  • A/P Scenario:
    • Enabling both features causes a 92.86% increase compared to the baseline.
    • Disabling persistent sessions and enabling external Infinispan leads to a 7.14% decrease in response time.

Key Takeaways

  1. User Logins Per Second:

    • Persistent Sessions: Generally reduces login rate across both A/A and A/P scenarios.
    • External Infinispan: Enhances login rate, especially noticeable in the A/P scenario with a 13.04% improvement.
  2. Response Time Percentiles:

    • Persistent Sessions: Increases response times significantly, particularly for browser post credentials and logout operations.
    • External Infinispan: Improves performance, showing zero to negative deltas when enabled without persistent sessions.
  3. Performance Trade-offs:

    • While persistent sessions offer state management benefits, they come with performance overhead.
    • External Infinispan provides substantial performance improvements, especially when persistent sessions are not enabled.

kami619 avatar Aug 06 '24 18:08 kami619

Results for the Client Credentials grant scenario

Credential Grants Per Second Per 1vCPU

Scenario External Infinispan Persistent Sessions Credential Grants per Sec Delta (%)
A/A True False 613 Baseline
A/A True True 609 -0.65%
A/P False False 533 -13.05%
A/P False True 625 +1.96%
A/P True False 621 +1.31%

Observations:

  • A/A Scenario: Enabling persistent sessions results in a slight 0.65% decrease in credential grants.
  • A/P Scenario:
    • Without external Infinispan, persistent sessions increase grants by 17.26% compared to the A/P baseline without persistent sessions.
    • Enabling external Infinispan results in a 16.52% increase compared to the A/P baseline without external Infinispan.

Response Time Percentiles (50th Percentile)

Scenario External Infinispan Persistent Sessions 1st Percentile Response Time (ms) Delta (%)
A/A True False 12 Baseline
A/A True True 12 0.00%
A/P False False 9 -25.00%
A/P False True 9 -25.00%
A/P True False 9 -25.00%

Observations:

  • A/A Scenario: Response times remain unchanged with persistent sessions enabled.
  • A/P Scenario:
    • All A/P configurations result in a 25.00% decrease in response time compared to the A/A baseline.

Key Takeaways

  1. Credential Grants Per Second:

    • A/A Scenario: Minimal impact from persistent sessions.
    • A/P Scenario: Both persistent sessions and external Infinispan increase credential grants, with external Infinispan providing a 16.52% improvement compared to A/P without it.
  2. Response Time Percentiles:

    • A/P configurations consistently achieve better response times, with a 25.00% improvement compared to the A/A baseline.

kami619 avatar Aug 06 '24 18:08 kami619

Follow-up issue created: #926

cc: @kami619, @pruivo

ahus1 avatar Aug 12 '24 17:08 ahus1