graphql-engine
graphql-engine copied to clipboard
Memory leak
Version Information
Server Version: v2.22.1 CLI Version (for CLI related issue):
Environment
On-premises
What is the current behaviour?
We've been seeing memory leaks since version v2.19.x and higher
Upgrading to the latest version did not solve the problem, version v2.8.4 works correctly.
What is the expected behaviour?
To work without memory leaks
How to reproduce the issue?
- Upgrade to 2.19.x and higher.
Screenshots or Screencast

OOM:
Please provide any traces or logs that could help here.
Any possible solutions/workarounds you're aware of?
Keywords
memory leak
HI @sergeimonakhov
What kind of workloads are you running in Hasura in the timeframe provided in your screenshot?
Hi @tirumaraiselvan. Mutations, queries and subscriptions to PostgreSQL are the only workloads which we use in the background.
@tirumaraiselvan The only thing we added compared to the previous version is remote schema permissions. I have just noticed, that Hasura stores them all as a huge string in a database and Hasura UI is freezing more and more with each new permission rule added. Given that architecture, there can be related memory issues on a back-end, definitely.
Something is happening for sure. This is a Hasura instance deployed at Railway.app
After restarting the memory usage dropped under 300 MB.
@tirumaraiselvan The only thing we added compared to the previous version is remote schema permissions. I have just noticed, that Hasura stores them all as a huge string in a database and Hasura UI is freezing more and more with each new permission rule added. Given that architecture, there can be related memory issues on a back-end, definitely.
@NikPaushkin Hi, thanks for additional info. Can you confirm if without remote schema permissions, you do not see any constant increase in memory (in v2.22)? The UI issue might be unrelated and solely be a console issue.
@tirumaraiselvan No, I can't confirm it now. We have 2.24.1 now without those remote schema permissions changes and it's still leaking by 500MB every day.
Hi, we have this same issue. It's affecting multiple clusters where we have Hasura deployed. It's been happening from 2.2x.x versions. Our latest memory leak test with a new cluster on v2.24.1. The cluster was created from scratch (using terraform scripts) and there was no traffic whatsoever and it's leaking memory all the time. We have a 512Mb limit on the pod and it restarts when the limit is reached. You can see that from the attached behaviour. We don't have any remote schemas.
We're on 2.25.0 and also seeing a memory leak and then OOM causing restarts. We have some remote schemas.
@cheets @tjenkinson Hey folks, just to confirm once again. You had the exact same metadata in previous versions and it didn't cause a memory growth like in newer versions?
Hey @tirumaraiselvan the metadata may have changed slightly. We were on 2.16.0 before and that appeared to have the same issue
We are actively developing our API so some changes occur every week to metadata. I don't recall anything major thought. We have some actions and couple event triggers but these have been in the metadata for a long time.
What is weird is that we can see this behavior on our K3S on-premise clusters. However we are also running exact same Hasura in Azure AKS and we haven't observed this memory leak there.
@cheets We are unable to reproduce this on idle schema like you mentioned here: https://github.com/hasura/graphql-engine/issues/9592#issuecomment-1551596064
You are saying you can't reproduce this on AKS. Do you want to test with different versions of Hasura on k3s to see if this might be a k3s issue (seems like this is being reported on newer versions so trying something like 2.11 might be a good start)? Also, maybe you can share your scripts so we can reproduce this on our end?
@NikPaushkin @tjenkinson Do you see some kind of leak with no traffic as well?
Our instances are always getting some traffic so not sure on that sorry
I’ve managed to reliably trigger the leak by repeatedly reloading all remote schemas from the console. Every time I do it memory goes up slightly
@tjenkinson Do you also have remote schema permissions?
@tirumaraiselvan this is after logging in with the admin secret
@tjenkinson Is it possible for you to send us your metadata? You can email me at [email protected]
.
hey @tirumaraiselvan unfortunately we are not able to do that due to the nature of what it contains. Noticed we also have some webhooks set up on event triggers. Not sure if that could also be a factor 🤷
@tjenkinson You can send us a smaller version by redacting or removing any sensitive info (it need not even work)? This is just to short circuit lots of metadata related questions we get when trying to reproduce/triage such issues.
Just FYI, we are not able to reproduce this by constantly reloading remote schemas. That's why I wanted to know if you have Remote Schema permissions configured in the metadata as well. Is that the case?
We have several instances of hasura and we do not use remote schema anywhere and we observe memory leaks as well. Even on an instance that is minimally used, once or twice a day. We use triggers on almost every one of them, but for example on the one mentioned above they are called about twice a day and memory leaks occur as well. The oldest instance we observe this on is version 2.20.1.
@dostalradim Are you able to share your metadata with us? You can email it to me at [email protected]
(feel free to remove any sensitive info)
I sent you the metadata of an application that is very rarely visited and its memory usage graph looks like this. Hasura version is 2.20.1
Thank you for investigation.
@dostalradim This is very helpful. Thank you. Could you also help us with the kind of activity that you have on this deployment? Is it queries, mutations, subscriptions, metadata related?
We use the app very little and only from Monday to Friday, from 07:00 to 15:30. In the time window of the sent graph we certainly did not do anything with the metadata, at most we used few queries, mutations and triggers. No one uses it at night and outside these times but the memory is still growing. The only thing that talks to the hasura constantly is the health check probe on the /healthz link.
FWIW We also have k8s probes configured towards the /healthz endpoint. We are using the CE docker image. Configuration:
livenessProbe:
httpGet:
path: '/healthz'
port: http
initialDelaySeconds: 30
timeoutSeconds: 3
periodSeconds: 60
successThreshold: 1
failureThreshold: 5
readinessProbe:
httpGet:
path: '/healthz'
port: http
initialDelaySeconds: 30
timeoutSeconds: 3
periodSeconds: 30
successThreshold: 1
failureThreshold: 5
Three days ago, I deployed hasura against empty postgres, no application, no any access from outside and the used memory is increasing. Only thing which is using the hasura is liveness probe, could it be the problem? I hope that this can help you. Hasura version is 2.23.0.
@dostalradim Thank you...this really helps. We are working on this.
And the last graph, empty database, empty hasura, no probe. Used memory is still increasing.
I have the same issue at v2.17.1
Now downgraded to v2.15.2