OOM exception is not well handled from HTTP Server
To reproduce
HTTP connection times out when server experiences OOM here:
2025-09-15T13:10:47.0890734Z 0000-00-00T00:00:00.000000Z C Unhandled exception in worker minhttp_0
2025-09-15T13:10:47.0891595Z io.questdb.cairo.CairoException: [-1] global RSS memory limit exceeded [usage=7330513230, RSS_MEM_LIMIT=7330513230, size=64, memoryTag=33]
2025-09-15T13:10:47.0892945Z at [email protected]/io.questdb.cairo.CairoException.instance(CairoException.java:375)
2025-09-15T13:10:47.0894187Z at [email protected]/io.questdb.cairo.CairoException.nonCritical(CairoException.java:133)
2025-09-15T13:10:47.0895176Z at [email protected]/io.questdb.std.Unsafe.checkAllocLimit(Unsafe.java:331)
2025-09-15T13:10:47.0897091Z at [email protected]/io.questdb.std.Unsafe.malloc(Unsafe.java:248)
2025-09-15T13:10:47.0898003Z at [email protected]/io.questdb.cutlass.http.HttpHeaderParser$BoundaryAugmenter.<init>(HttpHeaderParser.java:979)
2025-09-15T13:10:47.0899411Z at [email protected]/io.questdb.cutlass.http.HttpHeaderParser.<init>(HttpHeaderParser.java:58)
2025-09-15T13:10:47.0901270Z at [email protected]/io.questdb.cutlass.http.DefaultHttpHeaderParserFactory.newParser(DefaultHttpHeaderParserFactory.java:35)
2025-09-15T13:10:47.0902612Z at [email protected]/io.questdb.cutlass.http.HttpConnectionContext.<init>(HttpConnectionContext.java:148)
2025-09-15T13:10:47.0904103Z at [email protected]/io.questdb.cutlass.http.HttpServer$HttpContextFactory.lambda$new$0(HttpServer.java:371)
2025-09-15T13:10:47.0905338Z at [email protected]/io.questdb.std.WeakMutableObjectPool.newInstance(WeakMutableObjectPool.java:69)
2025-09-15T13:10:47.0906921Z at [email protected]/io.questdb.std.WeakMutableObjectPool.newInstance(WeakMutableObjectPool.java:31)
2025-09-15T13:10:47.0908022Z at [email protected]/io.questdb.std.WeakObjectPoolBase.fill(WeakObjectPoolBase.java:92)
2025-09-15T13:10:47.0909432Z at [email protected]/io.questdb.std.WeakMutableObjectPool.<init>(WeakMutableObjectPool.java:38)
2025-09-15T13:10:47.0911026Z at [email protected]/io.questdb.network.IOContextFactoryImpl.lambda$new$0(IOContextFactoryImpl.java:43)
2025-09-15T13:10:47.0911858Z at [email protected]/io.questdb.std.ThreadLocal.get(ThreadLocal.java:46)
2025-09-15T13:10:47.0913245Z at [email protected]/io.questdb.network.IOContextFactoryImpl.setup(IOContextFactoryImpl.java:86)
2025-09-15T13:10:47.0914564Z at [email protected]/io.questdb.network.AbstractIODispatcher.setup(AbstractIODispatcher.java:243)
2025-09-15T13:10:47.0915401Z at [email protected]/io.questdb.mp.Worker.run(Worker.java:136)
2025-09-15T13:11:47.1173458Z 2025-09-15T13:11:47.112274Z I server-main QuestDB is shutting down...
QuestDB version:
latest
OS, in case of Docker specify Docker and the Host OS:
Linux (arm)
File System, in case of Docker specify Host File System:
ext4
Full Name:
Vlad Ilyushechenko
Affiliation:
QuestDB
Have you followed Linux, MacOs kernel configuration steps to increase Maximum open files and Maximum virtual memory areas limit?
- [x] Yes, I have
Additional context
No response
Can I pick it?
@mghildiy go ahead. what's your idea of a sensible behaviour?
Memory usage can be tracked(MemoryMXBean). And if a liimt is crossed, application responds to other concurrent requests appropriately, may be saving their progress. At highest level of applicatino, we can catch OutOfMemoryError, do clean up, and exit.
@mghildiy it's complicated. this is about offheap/native memory exhausted. the problem is when memory is exhausted we cannot allocate anymore. but it appears that we require memory allocation to send a reply back to a user.
fixing this is not non-trivial and I'd recommend to pick something else unless you are familiar with memory management.
Hello @jerrinot,
I hope you are doing well. I have reviewed this issue thoroughly and would like to respectfully request to be assigned to it. I am confident I can contribute effectively and follow the project’s guidelines.
Thank you for considering my request. 🙏
Best regards, Aditya
Hi, I’d like to work on this issue and submit a PR. Could you please assign it to me?
hi @1-Arijit-choudhury, @adityagupta0251, it's best to join https://slack.questdb.com/ and ask question about the impl. or perhaps sync between each other - it's the best to join forced.
Hello @jerrinot I am writing in context with https://github.com/questdb/questdb/issues/6144. The error io.questdb.cairo.CairoException: [-1] global RSS memory limit exceeded means QuestDB's storage engine (Cairo) has enforced a memory cap on the entire process. The server is hitting QuestDB's total process memory limit (RSS), not just the Java heap. The error is triggered by new HTTP connections because the system is already at max memory from other operations. My approach is: Monitoring: I'll set up Prometheus and Grafana to visualize QuestDB's internal memory metrics to see exactly where the pressure is coming from. Reconfigure: Based on the monitoring, I'll increase the cairo.memory.rss.limit in server.conf to a safe value for the server's available RAM. Optimization: We'll investigate for memory-intensive queries and ensure our client applications are using connection pooling to reduce connection churn. Next steps or improvised approach will be taken on the go with the issue. Kindly have a look at my approach and let me kow if I can work on this!
Hello @jerrinot I am writing in context with #6144. The error io.questdb.cairo.CairoException: [-1] global RSS memory limit exceeded means QuestDB's storage engine (Cairo) has enforced a memory cap on the entire process. The server is hitting QuestDB's total process memory limit (RSS), not just the Java heap. The error is triggered by new HTTP connections because the system is already at max memory from other operations. My approach is: Monitoring: I'll set up Prometheus and Grafana to visualize QuestDB's internal memory metrics to see exactly where the pressure is coming from. Reconfigure: Based on the monitoring, I'll increase the cairo.memory.rss.limit in server.conf to a safe value for the server's available RAM. Optimization: We'll investigate for memory-intensive queries and ensure our client applications are using connection pooling to reduce connection churn. Next steps or improvised approach will be taken on the go with the issue. Kindly have a look at my approach and let me kow if I can work on this!
Hey Junaid, I think what this needs is that memory issues are handled within program when OOM happens.
What probably this needs is that a off-heap memory chunk is kept specifically for handling it, and when OOM happens , program uses that memory to perform the corrective actions needed to be taken, like sending back responses for ongoing requests, and so on. So it involves unsafe API(or maybe FFM API).
That's an excellent point. Graceful handling of OOMs at the application level would be a fantastic resilience feature. I'll focus on the operational side to prevent the OOM from happening Please let me know if I can work on this!