opencloud icon indicating copy to clipboard operation
opencloud copied to clipboard

High CPU usage when authentication fails (returns 204)

Open ScharfViktor opened this issue 2 months ago • 1 comments

Describe the bug

A clear and concise description of what the bug is.

Steps to reproduce

  1. start opencloud
  2. run k6 tests without created test user - do not run src-seeds-up-k6.js

test script - run as ./script-name.sh

#!/usr/bin/env bash
export SEED_USERS_TOTAL=30
export ADMIN_LOGIN=admin
export ADMIN_PASSWORD=admin
export PLATFORM_BASE_URL=https://localhost:9200
export AUTH_N_PROVIDER_KOPANO_BASE_URL=https://localhost:9200
export AUTH_N_PROVIDER_KOPANO_REDIRECT_URL=https://localhost:9200/oidc-callback.html
export PLATFORM_TYPE=openCloud

export TEST_KOKO_PLATFORM_020_RAMPING_SLEEP_AFTER_ITERATION=10s
export TEST_KOKO_PLATFORM_020_RAMPING_STAGES_VUS=4
export TEST_KOKO_PLATFORM_020_RAMPING_STAGES_UP_DURATION=20s
export TEST_KOKO_PLATFORM_020_RAMPING_STAGES_PEAK_DURATION=1m
export TEST_KOKO_PLATFORM_020_RAMPING_STAGES_DOWN_DURATION=20s

k6 run packages/k6-tests/artifacts/tests-koko-platform-020-navigate-file-tree-ramping-k6.js 

Expected behavior

no CPU leak

Actual behavior

When the test runs with invalid credentials (no test users created), the IDP returns HTTP 204 No Content instead of an error. As a result, the proxy and backend processes consume over 500% CPU.

Image

opencloud log:

cpu-leak.log

k6-test log:

ERRO[0021] Uncaught (in promise) Error: logonResponse.status is 204, expected 200
running at getContinueURI (file:///Users/v.scharf/Work/opencloud/cdperf/packages/k6-tdk/src/auth/kopano.ts:95:12(75))
navigateat credential (file:///Users/v.scharf/Work/opencloud/cdperf/packages/k6-tdk/src/auth/kopano.ts:48:46(15))
        at headers (file:///Users/v.scharf/Work/opencloud/cdperf/packages/k6-tdk/src/auth/kopano.ts:36:36(3))
        at file:///Users/v.scharf/Work/opencloud/cdperf/packages/k6-tdk/src/utils/http.ts:26:49(24)
        at Oe (file:///Users/v.scharf/Work/opencloud/cdperf/packages/k6-tdk/src/endpoints/graph-v1-me.ts:7:20(20))
        at getMyDrives (file:///Users/v.scharf/Work/opencloud/cdperf/packages/k6-tdk/src/client/me.ts:38:70(40))
        at getTestRoot (file:///Users/v.scharf/Work/opencloud/cdperf/packages/k6-tests/src/test/root.ts:48:62(76))
        at call (native)
        at file:///Users/v.scharf/Work/opencloud/cdperf/packages/k6-tests/tests/koko/platform/020-navigate-file-tree/simple.k6.ts:38:39(27)
        at set (file:///Users/v.scharf/Work/opencloud/cdperf/packages/k6-tdk/src/utils/store.ts:51:30(16))
        at setOrGet (file:///Users/v.scharf/Work/opencloud/cdperf/packages/k6-tdk/src/utils/store.ts:58:23(15))
        at navigate_file_tree_020 (file:///Users/v.scharf/Work/opencloud/cdperf/packages/k6-tests/tests/koko/platform/020-navigate-file-tree/simple.k6.ts:37:40(42))  executor=ramping-vus scenario=navigate_file_tree_020
ERRO[0021] Uncaught (in promise) Error: logonResponse.status is 204, expected 200
running at getContinueURI (file:///Users/v.scharf/Work/opencloud/cdperf/packages/k6-tdk/src/auth/kopano.ts:95:12(75))
navigateat credential (file:///Users/v.scharf/Work/opencloud/cdperf/packages/k6-tdk/src/auth/kopano.ts:48:46(15))
        at headers (file:///Users/v.scharf/Work/opencloud/cdperf/packages/k6-tdk/src/auth/kopano.ts:36:36(3))
        at file:///Users/v.scharf/Work/opencloud/cdperf/packages/k6-tdk/src/utils/http.ts:26:49(24)
        at Oe (file:///Users/v.scharf/Work/opencloud/cdperf/packages/k6-tdk/src/endpoints/graph-v1-me.ts:7:20(20))
        at getMyDrives (file:///Users/v.scharf/Work/opencloud/cdperf/packages/k6-tdk/src/client/me.ts:38:70(40))
        at getTestRoot (file:///Users/v.scharf/Work/opencloud/cdperf/packages/k6-tests/src/test/root.ts:48:62(76))
        at call (native)
        at file:///Users/v.scharf/Work/opencloud/cdperf/packages/k6-tests/tests/koko/platform/020-navigate-file-tree/simple.k6.ts:38:39(27)
        at set (file:///Users/v.scharf/Work/opencloud/cdperf/packages/k6-tdk/src/utils/store.ts:51:30(16))
        at setOrGet (file:///Users/v.scharf/Work/opencloud/cdperf/packages/k6-tdk/src/utils/store.ts:58:23(15))
        at navigate_file_tree_020 (file:///Users/v.scharf/Work/opencloud/cdperf/packages/k6-tests/tests/koko/platform/020-navigate-file-tree/simple.k6.ts:37:40(42))  executor=ramping-vus scenario=navigate_file_tree_020


  █ TOTAL RESULTS 

    checks_total.......: 920     44.498963/s
    checks_succeeded...: 0.00%   0 out of 920
    checks_failed......: 100.00% 920 out of 920

    ✗ authn -> logonResponse - status
      ↳  0% — ✓ 0 / ✗ 920

    HTTP
    http_req_duration..............: avg=34.02ms min=12.47ms med=30.21ms max=248.94ms p(90)=52.16ms p(95)=61.7ms 
      { expected_response:true }...: avg=34.02ms min=12.47ms med=30.21ms max=248.94ms p(90)=52.16ms p(95)=61.7ms 
    http_req_failed................: 0.00%  0 out of 920
    http_reqs......................: 920    44.498963/s

    EXECUTION
    iteration_duration.............: avg=35.24ms min=13.98ms med=31.28ms max=249.83ms p(90)=53.4ms  p(95)=63.96ms
    iterations.....................: 920    44.498963/s
    vus............................: 3      min=0        max=3
    vus_max........................: 4      min=4        max=4

    NETWORK
    data_received..................: 1.0 MB 49 kB/s
    data_sent......................: 412 kB 20 kB/s

Setup

Please describe how you started the server and provide a list of relevant environment variables or configuration files.

OC_XXX=somevalue
OC_YYY=somevalue
PROXY_XXX=somevalue

Additional context

Add any other context about the problem here.

ScharfViktor avatar Oct 16 '25 08:10 ScharfViktor

Image

This is a 5sec cpu profile recorded when running the test (while opencloud consume all of the available CPU). As can be seen 70% of the CPU is spent in computing argon2 hashes. Another 19% is spent in the garbage collector (unfortunately cut off from this screenshot, so you have to trust me).

The argon2 hash computation happens when the IDM service needs to validate passwords (https://github.com/opencloud-eu/opencloud/blob/main/vendor/github.com/libregraph/idm/pkg/ldappassword/ldappassword.go#L44). Argon2 is supposed to be CPU and Memory intensive.

You might wonder why we are checking passwords so often when there is not even a user in the system. This is mainly caused by the idp (lico) service not reusing its connection to the idm (LDAP) service. For every login attempt it opens a new connection and authenticates as the internal idp service user (this already causes the hash computation). That connections is used to perform an LDAP lookup of the username that was entered into the login form. In this special case the lookup will not succeed, return an error and the user login in fails.

So basically the system is busy re-authenticating the idp service user over and over again.

A possible solution would be to make lico use a persistent LDAP connection for the user lookup, like we're doing in all the other services. Actually we submitted a PR for that already quite a while ago: https://github.com/libregraph/lico/pull/77 it's basically "just" missing some reconnect logic.

I am not sure if this really deserves the Prio 2 as it just happening with the lico / libregraph-idm setup. I guess this is for somebody else to decide. It however highlights that for productions setups some kind of rate-limiting is needed otherwise it is pretty easy to burn CPU cycles and affect service quality by just attempting to login over and over again.

rhafer avatar Oct 22 '25 14:10 rhafer