opal
opal copied to clipboard
Memory usage rises higher if opal-client fetches policy data fail
Describe the bug If opal-client fetch policy data fail, memory usage will keep increasing until it fetches data correctly.
To Reproduce
- wrong entries.url in OPA_DATA_CONFIG_SOURCES
- opal-client fetch policy data fail and retry constantly
- inspect opal-client container memory usage, it gets higher and higher
- memory won't go down even if it fetches data successfully, it keeps current status from now on.
Please also include: N/A
Expected behavior memory usage should
Screenshots
OPAL version
- opal-client-standalone:latest
Hey @WellyHong thanks for reporting this issue!! :)
@roekatz will investigate this and we will come back to you with an answer.
@WellyHong I wasn't able to reproduce this locally by configuring wrong entries.url
:
- The opal-client shuts down instead of retrying, so memory never accumulates.
- I get a
TimeoutError()
rather than exception in the screenshot you've shared.
What version of OPAL are you using? What did you used for "wrong" url?
Hi @roekatz ,
image: permitio/opal-client-standalone
tag: latest
the wrong url means opal-client cannot fetch data-source correctly, because the server cannot find corresponding data, then return http 400 status & OPAL_POLICY_TARGET_NOT_EXISTS message.
after glancing at codes, seems like it receives on_connect
callback in /opal-client/opal_client/data/updater.py and call get_base_policy_data
repeatedly.
Hi @WellyHong,
There might be a memory issue with the fetcher retry mechanism. The first step for us to investigate needs to be reproducing the issue on our end :)
There is no place in our codebase that returns OPAL_POLICY_TARGET_NOT_EXISTS
. I suspect you are using an external_url in your OPAL_DATA_CONFIG_SOURCES
and use a config server to return dynamic data sources.
Can you please send a redacted version of your opal config (both server and client) - the most interesting part is the value of OPAL_DATA_CONFIG_SOURCES
but it's probably better to send the entire config just in case.
Hi @WellyHong,
There might be a memory issue with the fetcher retry mechanism. The first step for us to investigate needs to be reproducing the issue on our end :)
There is no place in our codebase that returns
OPAL_POLICY_TARGET_NOT_EXISTS
. I suspect you are using an external_url in yourOPAL_DATA_CONFIG_SOURCES
and use a config server to return dynamic data sources.
Indeed I followed configure external data sources, deployed a config server which serves different OPAL_DATA_CONFIG_SOURCES
to individual opal-client.
Can you please send a redacted version of your opal config (both server and client) - the most interesting part is the value of
OPAL_DATA_CONFIG_SOURCES
but it's probably better to send the entire config just in case.
deployed with kubernetes Deployment
- opal-client
image: permitio/opal-client
command: ["/bin/sh"]
args: ["-c", "uvicorn opal_client.main:app --reload --port=7000"]
env:
- name: OPAL_SERVER_URL
value: https://opal-server.host
- name: OPAL_POLICY_STORE_URL
value: http://localhost:8181
- name: OPAL_DATA_TOPICS
value: client-topic # individual client topic
- opal-server
image: permitio/opal-server
env:
- name: OPAL_DATA_CONFIG_SOURCES
value: {"external_source_url": "https://config-server.host/api/v1/opal/source"}
- name: UVICORN_NUM_WORKERS
value: 1
- name: OPAL_BROADCAST_URI
value: memory://
- name: OPAL_POLICY_REPO_URL
value: [email protected]
- name: OPAL_POLICY_REPO_MAIN_BRANCH
value: master
- below is
OPAL_DATA_CONFIG_SOURCES
{
"config": {
"entries": [
{
"url": "https://config-server.host/api/v1/opal/policyData?individual-client-topic",
"topics": [
"individual/client/topic"
],
"config": {
"headers": {
"Authorization": "Bearer jwt"
}
}
}
]
}
}
@WellyHong Great! that helped me reproduce the issue.
The source of the memory leak is in our FASTAPI Websocket RPC
library .
The client's connect
method is retried indefinitely while not closing all relevant resources.
I've created an issue there - https://github.com/permitio/fastapi_websocket_rpc/issues/13
And I'm working to fix it soon. @WellyHong - Thanks for the report!