opal icon indicating copy to clipboard operation
opal copied to clipboard

Memory usage rises higher if opal-client fetches policy data fail

Open WellyHong opened this issue 1 year ago • 6 comments

Describe the bug If opal-client fetch policy data fail, memory usage will keep increasing until it fetches data correctly.

To Reproduce

  1. wrong entries.url in OPA_DATA_CONFIG_SOURCES
  2. opal-client fetch policy data fail and retry constantly
  3. inspect opal-client container memory usage, it gets higher and higher
  4. memory won't go down even if it fetches data successfully, it keeps current status from now on.

Please also include: N/A

Expected behavior memory usage should

Screenshots opal-client-fetch-fail memory_init memory_incress

OPAL version

  • opal-client-standalone:latest

WellyHong avatar Jul 15 '22 05:07 WellyHong

Hey @WellyHong thanks for reporting this issue!! :)

@roekatz will investigate this and we will come back to you with an answer.

asafc avatar Jul 17 '22 15:07 asafc

@WellyHong I wasn't able to reproduce this locally by configuring wrong entries.url:

  1. The opal-client shuts down instead of retrying, so memory never accumulates.
  2. I get a TimeoutError() rather than exception in the screenshot you've shared.

What version of OPAL are you using? What did you used for "wrong" url?

roekatz avatar Jul 27 '22 11:07 roekatz

Hi @roekatz ,

image: permitio/opal-client-standalone tag: latest

the wrong url means opal-client cannot fetch data-source correctly, because the server cannot find corresponding data, then return http 400 status & OPAL_POLICY_TARGET_NOT_EXISTS message.

after glancing at codes, seems like it receives on_connect callback in /opal-client/opal_client/data/updater.py and call get_base_policy_data repeatedly.

WellyHong avatar Aug 01 '22 07:08 WellyHong

Hi @WellyHong,

There might be a memory issue with the fetcher retry mechanism. The first step for us to investigate needs to be reproducing the issue on our end :)

There is no place in our codebase that returns OPAL_POLICY_TARGET_NOT_EXISTS . I suspect you are using an external_url in your OPAL_DATA_CONFIG_SOURCES and use a config server to return dynamic data sources.

Can you please send a redacted version of your opal config (both server and client) - the most interesting part is the value of OPAL_DATA_CONFIG_SOURCES but it's probably better to send the entire config just in case.

asafc avatar Aug 02 '22 08:08 asafc

Hi @WellyHong,

There might be a memory issue with the fetcher retry mechanism. The first step for us to investigate needs to be reproducing the issue on our end :)

There is no place in our codebase that returns OPAL_POLICY_TARGET_NOT_EXISTS . I suspect you are using an external_url in your OPAL_DATA_CONFIG_SOURCES and use a config server to return dynamic data sources.

Indeed I followed configure external data sources, deployed a config server which serves different OPAL_DATA_CONFIG_SOURCES to individual opal-client.

Can you please send a redacted version of your opal config (both server and client) - the most interesting part is the value of OPAL_DATA_CONFIG_SOURCES but it's probably better to send the entire config just in case.

deployed with kubernetes Deployment

  • opal-client
image: permitio/opal-client
command: ["/bin/sh"]
args: ["-c", "uvicorn opal_client.main:app --reload --port=7000"]
env:
  - name: OPAL_SERVER_URL
    value: https://opal-server.host
  - name: OPAL_POLICY_STORE_URL
    value: http://localhost:8181
  - name: OPAL_DATA_TOPICS
    value: client-topic # individual client topic
  • opal-server
image: permitio/opal-server
env:
  - name: OPAL_DATA_CONFIG_SOURCES
    value: {"external_source_url": "https://config-server.host/api/v1/opal/source"}
  - name: UVICORN_NUM_WORKERS
    value: 1
  - name: OPAL_BROADCAST_URI
    value: memory://
  - name: OPAL_POLICY_REPO_URL
    value: [email protected]
  - name: OPAL_POLICY_REPO_MAIN_BRANCH
    value: master
  • below is OPAL_DATA_CONFIG_SOURCES
{
    "config": {
        "entries": [
            {
                "url": "https://config-server.host/api/v1/opal/policyData?individual-client-topic",
                "topics": [
                    "individual/client/topic"
                ],
                "config": {
                    "headers": {
                        "Authorization": "Bearer jwt"
                    }
                }
            }
        ]
    }
}

WellyHong avatar Aug 03 '22 01:08 WellyHong

@WellyHong Great! that helped me reproduce the issue.

The source of the memory leak is in our FASTAPI Websocket RPC library . The client's connect method is retried indefinitely while not closing all relevant resources. I've created an issue there - https://github.com/permitio/fastapi_websocket_rpc/issues/13

And I'm working to fix it soon. @WellyHong - Thanks for the report!

roekatz avatar Aug 03 '22 19:08 roekatz