opal icon indicating copy to clipboard operation
opal copied to clipboard

OPAL Client memory consumption over time increase

Open kreyyser opened this issue 9 months ago • 5 comments

Describe the bug We have OPAL client sidecar that fetches each 30 seconds policies and 1 datasource which gives around 20MB json that is fetched once in an hour. When you start the pod everything looks fine but over time (possibly each data fetch) OPAL starts consuming more and more memory. It starts from 200-300 MB from the beginning for OPAL and over time increased. When pod is not queried at all - it is around 300-350 MB. When you start querying OPA it raises to around 420MB when OPA caches (maybe) data and for OPA it is stable over time, but overall pod memory consumption continues to grow. For some pods that have 2 datasources (20MB json and 8MB json) OPAL client pod consumes up to 1,5 Gigs of RAM with 2-4 RPS load which seems to be very strange. And again - OPA memory consumption remains stable. OPA metrics are exported to prometheus, and even proc/ folder check shows that it is opal which consumes the most of RAM.

Screenshots are taken for different pods but the main idea is still shown.

Image Image

To Reproduce Run OPAL subscribed to datasource and policies fetching. Datasource should return around 20MB of data and fairly small policy. Datasource of opal server configured to refetch whole data bundle every hour. Policies should be fetched every 30 seconds. Leave pod for about 24 hours and observe increased amount of memory. Logs are completely empty and do not show anything interesting.

opal_logs.log

OPAL Client configuration

opal-client:
    Ports:          7000/TCP, 8777/TCP
    Host Ports:     0/TCP, 0/TCP
    State:          Running
      Started:      Mon, 24 Feb 2025 10:58:17 +0200
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     1500m
      memory:  3Gi
    Requests:
      cpu:     1500m
      memory:  1Gi
    Liveness:  exec [/bin/bash -c response=$(curl -X GET http://localhost:7000/healthcheck | jq -r '.status'); if [ "$response" != "ok" ]; then
  echo "Liveness probe. Ping Opal Client failed 📛";
  exit 1;
else
  echo "Liveness probe. Ping Opal Client success 🚀";
  exit 0;
fi
] delay=0s timeout=10s period=30s #success=1 #failure=5
    Readiness:  exec [/bin/bash -c response=$(curl -X GET http://localhost:7000/healthcheck | jq -r '.status'); if [ "$response" != "ok" ]; then
  echo "Readiness probe. Ping Opal Client failed 📛";
  exit 1;
else
  echo "Readiness probe. Ping Opal Client success 🚀";
  exit 0;
fi
] delay=5s timeout=10s period=15s #success=1 #failure=5
    Environment:
      GOMEMLIMIT:                                1073741824 (requests.memory)
      GOMAXPROCS:                                2 (requests.cpu)
      OPAL_POLICY_STORE_POLICY_PATHS_TO_IGNORE:  backups/**
      OPAL_ENABLE_OPENTELEMETRY_TRACING:         true
      OPAL_ENABLE_OPENTELEMETRY_METRICS:         true
      OPAL_OPENTELEMETRY_OTLP_ENDPOINT:          jaeger-collector.jaeger.svc.cluster.local:4317
      OPAL_LOG_FORMAT:                           {time:YYYY-MM-DD HH:mm:ss.SSS}|<level>{level:^6} | {process} | {name: <40} | {message}</level>
                                                 {exception}
      OPAL_SERVER_URL:                           http://opal-server.namespace.svc.cluster.local:7002
      OPAL_DATA_UPDATER_ENABLED:                 True
      OPAL_LOG_FORMAT_INCLUDE_PID:               true
      OPAL_LOG_LEVEL:                            INFO
      OPAL_INLINE_OPA_LOG_FORMAT:                full
      OPAL_STATISTICS_ENABLED:                   true
      OPAL_INLINE_OPA_CONFIG:                    {
                                                   "addr": ":8777",
                                                   "authentication": "off",
                                                   "authorization": "off",
                                                   "log_level": "info",
                                                   "files": [],
                                                   "config_file": "/opa-config/config.yaml"
                                                 }
      OPAL_INLINE_CEDAR_CONFIG:                  {
                                                   "addr": ":8777",
                                                   "authentication": "off",
                                                   "authentication_token": "None",
                                                   "files": []
                                                 }
      OPAL_LOG_COLORIZE:                         false
      OPAL_SHOULD_REPORT_ON_DATA_UPDATES:        False
      OPAL_OPA_HEALTH_CHECK_POLICY_ENABLED:      True
      OPAL_OFFLINE_MODE_ENABLED:                 True
      OPAL_POLICY_STORE_URL:                     http://localhost:8777
      OPAL_POLICY_UPDATER_ENABLED:               True
      OPAL_POLICY_SUBSCRIPTION_DIRS:             tenants/common-libs/policies:tenants/domain/policies
      OPAL_OPAL_CLIENT_STAT_ID:                  3718139b-c462-4b0d-bec7-02f615143a27
      OPAL_LOG_TRACEBACK:                        FALSE
      OPAL_CLIENT_TOKEN:                         <client_token>
      OPAL_DEFAULT_UPDATE_CALLBACK_CONFIG:       {
                                                   "method": "post",
                                                   "headers": {
                                                     "Authorization": "Bearer <token>",
                                                     "content-type": "application/json"
                                                   },
                                                   "process_data": "false"
                                                 }
      OPAL_DATA_TOPICS:                          ad-datasource

Expected behavior Memory consumption should be stable.

OPAL version OPAL Server version - 0.7.16 OPAL Client version - 0.8.0

kreyyser avatar Feb 24 '25 12:02 kreyyser

Thanks for sharing this We will create a ticket and send it to prioritization If you have more data like suspected functions or even have a lead and you want to open a PR and contribute to OSS this is always appreciated

obsd avatar Feb 24 '25 13:02 obsd

Ticket id 11896

obsd avatar Feb 24 '25 13:02 obsd

Update.

We were checking pod consumption during the day and recently - over the night. As you can see consumption slowly grows.

Image

Pod was running for 22 hours without restarts Image

OPA consumption remains flat line Image

Attaching available logs

opal_logs_over_night.log

kreyyser avatar Feb 25 '25 07:02 kreyyser

If we spinup the same container without datasources subscribed - memory consumption does not increase. Deviations are mostly due to OPA memory usage.

Image Image

And again, no traffic on both pods.

kreyyser avatar Feb 26 '25 08:02 kreyyser

@obsd the only thing we can think that can cause this problem is fetching/updating or backing up mechanism.

Is there any progress on this? It is kinda urgent for us as we already have this issue on our production environment.

kreyyser avatar Feb 26 '25 08:02 kreyyser

Hi, Any update on this one?

psardana avatar Aug 15 '25 12:08 psardana