pgadmin4 icon indicating copy to clipboard operation
pgadmin4 copied to clipboard

pgadmin is OOMKilled by kubernetes when querying big-ish tables (6.5 million rows)

Open StefanLobbenmeierObjego opened this issue 11 months ago • 2 comments

Please note that security bugs or issues should be reported to [email protected].

Describe the bug

In the past we could query this table without pgadmin using so much memory, so it is unclear to me whether there is a limit in pgadmin how many rows can be fetched at a time and the table grew too large or if this is an regression. I tried versions 8.13 and 8.14 though and remember this working in the past so I am leaning towards the table being too big.

We tried increasing pgadmin memory limits in kubernetes from 500Mi to 4096Mi and it still would be OOMKilled so this points towards some limit within pgadmin.

To Reproduce

I cannot share the table with the data, but I will try to add some SQL that creates a table of a similar size that can be used

  1. Have a big table with (in our case) 6515458 rows
  2. Select * from data.big_table
  3. http response 503, kubernetes pod is OOMKilled

Expected behavior

  • pgadmin is aware of memory limits and avoids exceeding them to prevent being killed
  • pgadmin does not need so much memory
  • ideally: pgadmin transforms queries into paginated queries automatically similarly how datagrip by jetbrains does that

Error message

If possible, include the full, unedited text of any error messages displayed.

Screenshots

If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

  • OS: Kubernetes server Kernel Version: 5.15.0-1075-azure OS Image: Ubuntu 22.04.5 LTS Operating System: linux Architecture: amd64 Container Runtime Version: containerd://1.7.23-1 Kubelet Version: v1.29.11 Kube-Proxy Version: v1.29.11

  • Version: 8.14

  • Mode: server

  • Browser (if running in server mode): chrome

  • Package type: Container dpage/pgadmin4:8.14

Additional context

Add any other context about the problem here.

StefanLobbenmeierObjego avatar Jan 29 '25 11:01 StefanLobbenmeierObjego

I managed to reproduce like this with 1 million rows and 1 Gi of memory limit:

CREATE TEMP TABLE t AS SELECT generate_series(1, 3e7) x;
select * from t;

Image

kubernetes describe pod:

kubectl describe pod -n tools pgadmin-oauth-7755bbbc84-jgpgm
Name:             pgadmin-oauth-7755bbbc84-jgpgm
Namespace:        tools
Priority:         0
Service Account:  default
Node:             aks-intmain-28895530-vmss000004/10.60.2.100
Start Time:       Wed, 29 Jan 2025 12:29:45 +0100
Labels:           app=pgadmin-oauth
                  pod-template-hash=7755bbbc84
Annotations:      kubectl.kubernetes.io/restartedAt: 2023-12-12T10:25:52+01:00
Status:           Running
IP:               10.60.2.119
IPs:
  IP:           10.60.2.119
Controlled By:  ReplicaSet/pgadmin-oauth-7755bbbc84
Init Containers:
  init-pgadmin:
    Container ID:  containerd://eec7371868a84d8ece9728116c38d9aabd69a3f0509e831d9206892b70b420aa
    Image:         dpage/pgadmin4:8.13
    Image ID:      docker.io/dpage/pgadmin4@sha256:561c1f8f99f2fe24bc63814db9e30292cf4360331de9182da21e3554ce61bdaa
    Port:          <none>
    Host Port:     <none>
    Command:
      /bin/chown
      -R
      5050:5050
      /var/lib/pgadmin
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Wed, 29 Jan 2025 12:30:23 +0100
      Finished:     Wed, 29 Jan 2025 12:30:23 +0100
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /var/lib/pgadmin from pgadmin-data (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-xlhdf (ro)
Containers:
  pgadmin:
    Container ID:   containerd://1bbb51d2b841d55c5c9d8e2f2bb13be5bf0d8d4a23f74215ae5266d2785804e3
    Image:          dpage/pgadmin4:8.14
    Image ID:       docker.io/dpage/pgadmin4@sha256:8a68677a97b8c8d1427dc915672a26d2c4a04376916a68256f53d669d6171be7
    Port:           80/TCP
    Host Port:      0/TCP
    State:          Running
      Started:      Wed, 29 Jan 2025 13:01:36 +0100
    Last State:     Terminated
      Reason:       OOMKilled
      Exit Code:    137
      Started:      Wed, 29 Jan 2025 12:32:09 +0100
      Finished:     Wed, 29 Jan 2025 13:01:35 +0100
    Ready:          True
    Restart Count:  2
    Limits:
      memory:  1Gi
    Requests:
      cpu:      75m
      memory:   1Gi
    Liveness:   http-get http://:80/misc/ping delay=0s timeout=1s period=60s #success=1 #failure=3
    Readiness:  http-get http://:80/misc/ping delay=0s timeout=1s period=60s #success=1 #failure=3
    Startup:    http-get http://:80/misc/ping delay=0s timeout=1s period=1s #success=1 #failure=60
    Environment:
      PGADMIN_DEFAULT_EMAIL:                      censored
      PGADMIN_DEFAULT_PASSWORD:                   censored
      PGADMIN_PORT:                               80
      PGADMIN_CONFIG_ENHANCED_COOKIE_PROTECTION:  False
      PGADMIN_CONFIG_AUTHENTICATION_SOURCES:      ['oauth2', 'internal']
      PGADMIN_CONFIG_SECURITY_POST_LOGIN_VIEW:    '/pgadmin/browser'
      GUNICORN_LIMIT_REQUEST_LINE:                0
      PGADMIN_CONFIG_OAUTH2_CONFIG:               <set to the key 'oauth-config' in secret 'pgadmin-oauth-secret'>  Optional: false
      PGADMIN_CONFIG_OAUTH2_AUTO_CREATE_USER:     True
    Mounts:
      /var/lib/pgadmin from pgadmin-data (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-xlhdf (ro)
Conditions:
  Type                        Status
  PodReadyToStartContainers   True
  Initialized                 True
  Ready                       True
  ContainersReady             True
  PodScheduled                True
Volumes:
  pgadmin-data:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  pgadmin-oauth
    ReadOnly:   false
  pgadmin-oauth-secret:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  pgadmin-oauth-secret
    Optional:    false
  kube-api-access-xlhdf:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                             node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason                  Age                 From                     Message
  ----     ------                  ----                ----                     -------
  Normal   Scheduled               32m                 default-scheduler        Successfully assigned tools/pgadmin-oauth-7755bbbc84-jgpgm to aks-intmain-28895530-vmss000004
  Warning  FailedAttachVolume      32m                 attachdetach-controller  Multi-Attach error for volume "pvc-5bd28f47-46e7-41af-b0ee-43fae6f7fc4c" Volume is already exclusively attached to one node and can't be attached to another
  Normal   SuccessfulAttachVolume  31m                 attachdetach-controller  AttachVolume.Attach succeeded for volume "pvc-5bd28f47-46e7-41af-b0ee-43fae6f7fc4c"
  Normal   Pulling                 31m                 kubelet                  Pulling image "dpage/pgadmin4:8.13"
  Normal   Pulled                  31m                 kubelet                  Successfully pulled image "dpage/pgadmin4:8.13" in 12.082s (12.082s including waiting)
  Normal   Created                 31m                 kubelet                  Created container init-pgadmin
  Normal   Started                 31m                 kubelet                  Started container init-pgadmin
  Warning  Unhealthy               31m (x7 over 31m)   kubelet                  Startup probe failed: Get "http://10.60.2.119:80/misc/ping": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
  Normal   Pulled                  15s (x3 over 31m)   kubelet                  Container image "dpage/pgadmin4:8.14" already present on machine
  Normal   Created                 15s (x3 over 31m)   kubelet                  Created container pgadmin
  Normal   Started                 15s (x3 over 31m)   kubelet                  Started container pgadmin
  Warning  Unhealthy               12s (x12 over 31m)  kubelet                  Startup probe failed: Get "http://10.60.2.119:80/misc/ping": dial tcp 10.60.2.119:80: connect: connection refused

weird thing is also that it is killed at 700Mi of usage:

Image

I am not 100% how this works exactly, possibly pgadmin requests more than 300Mi at once and this is denied and instead met with a OOMKill?

StefanLobbenmeierObjego avatar Jan 29 '25 12:01 StefanLobbenmeierObjego

Related to #5797

khushboovashi avatar Jan 31 '25 11:01 khushboovashi

Hi @StefanLobbenmeierObjego,

We have fixed this issue at the client side and also introduced a server-side cursor. To enable server-side cursor, please refer to https://www.pgadmin.org/docs/pgadmin4/development/query_tool.html#server-side-cursor.

Can you please test the snapshot (docker pull dpage/pgadmin4:snapshot) to verify the fix? Ref: https://hub.docker.com/r/dpage/pgadmin4/tags

khushboovashi avatar Jul 16 '25 07:07 khushboovashi

I can confirm the server cursor prevents the restart. Ideally there is also the option to force to use the cursor when pgadmin sees that this is a big result set / force everyone to use this to prevent the restarts, but I guess it works to just manually enable it in all the accounts that use pgadmin.

StefanLobbenmeierObjego avatar Jul 16 '25 11:07 StefanLobbenmeierObjego

I can confirm the server cursor prevents the restart. Ideally there is also the option to force to use the cursor when pgadmin sees that this is a big result set / force everyone to use this to prevent the restarts, but I guess it works to just manually enable it in all the accounts that use pgadmin.

Without executing the query, pgAdmin cannot predict the size of a large result set, and we don’t want to run an additional query just to determine the count.

khushboovashi avatar Jul 16 '25 11:07 khushboovashi

I see - I think intellij puts a default limit of 501 rows on any query and shows the first 500, if you get all 501 rows it knows it is supposed to paginate. Not sure if this is something pgadmin would like to copy / make a setting

but having the cursor option is already a huge win, thank you so much!

StefanLobbenmeierObjego avatar Jul 23 '25 11:07 StefanLobbenmeierObjego

I see - I think intellij puts a default limit of 501 rows on any query and shows the first 500, if you get all 501 rows it knows it is supposed to paginate. Not sure if this is something pgadmin would like to copy / make a setting

but having the cursor option is already a huge win, thank you so much!

Do you think we should close this ticket?

khushboovashi avatar Jul 23 '25 12:07 khushboovashi

Yeah we can close it, was not sure whether you wanted to close with the release but I guess the release is soon anyway.

StefanLobbenmeierObjego avatar Jul 23 '25 18:07 StefanLobbenmeierObjego

Implemented server cursor as a part of #5797.

khushboovashi avatar Jul 24 '25 05:07 khushboovashi