questdb OOM exception is not well handled from HTTP Server

To reproduce

HTTP connection times out when server experiences OOM here:

2025-09-15T13:10:47.0890734Z 0000-00-00T00:00:00.000000Z C Unhandled exception in worker minhttp_0
2025-09-15T13:10:47.0891595Z io.questdb.cairo.CairoException: [-1] global RSS memory limit exceeded [usage=7330513230, RSS_MEM_LIMIT=7330513230, size=64, memoryTag=33]
2025-09-15T13:10:47.0892945Z 	at [email protected]/io.questdb.cairo.CairoException.instance(CairoException.java:375)
2025-09-15T13:10:47.0894187Z 	at [email protected]/io.questdb.cairo.CairoException.nonCritical(CairoException.java:133)
2025-09-15T13:10:47.0895176Z 	at [email protected]/io.questdb.std.Unsafe.checkAllocLimit(Unsafe.java:331)
2025-09-15T13:10:47.0897091Z 	at [email protected]/io.questdb.std.Unsafe.malloc(Unsafe.java:248)
2025-09-15T13:10:47.0898003Z 	at [email protected]/io.questdb.cutlass.http.HttpHeaderParser$BoundaryAugmenter.<init>(HttpHeaderParser.java:979)
2025-09-15T13:10:47.0899411Z 	at [email protected]/io.questdb.cutlass.http.HttpHeaderParser.<init>(HttpHeaderParser.java:58)
2025-09-15T13:10:47.0901270Z 	at [email protected]/io.questdb.cutlass.http.DefaultHttpHeaderParserFactory.newParser(DefaultHttpHeaderParserFactory.java:35)
2025-09-15T13:10:47.0902612Z 	at [email protected]/io.questdb.cutlass.http.HttpConnectionContext.<init>(HttpConnectionContext.java:148)
2025-09-15T13:10:47.0904103Z 	at [email protected]/io.questdb.cutlass.http.HttpServer$HttpContextFactory.lambda$new$0(HttpServer.java:371)
2025-09-15T13:10:47.0905338Z 	at [email protected]/io.questdb.std.WeakMutableObjectPool.newInstance(WeakMutableObjectPool.java:69)
2025-09-15T13:10:47.0906921Z 	at [email protected]/io.questdb.std.WeakMutableObjectPool.newInstance(WeakMutableObjectPool.java:31)
2025-09-15T13:10:47.0908022Z 	at [email protected]/io.questdb.std.WeakObjectPoolBase.fill(WeakObjectPoolBase.java:92)
2025-09-15T13:10:47.0909432Z 	at [email protected]/io.questdb.std.WeakMutableObjectPool.<init>(WeakMutableObjectPool.java:38)
2025-09-15T13:10:47.0911026Z 	at [email protected]/io.questdb.network.IOContextFactoryImpl.lambda$new$0(IOContextFactoryImpl.java:43)
2025-09-15T13:10:47.0911858Z 	at [email protected]/io.questdb.std.ThreadLocal.get(ThreadLocal.java:46)
2025-09-15T13:10:47.0913245Z 	at [email protected]/io.questdb.network.IOContextFactoryImpl.setup(IOContextFactoryImpl.java:86)
2025-09-15T13:10:47.0914564Z 	at [email protected]/io.questdb.network.AbstractIODispatcher.setup(AbstractIODispatcher.java:243)
2025-09-15T13:10:47.0915401Z 	at [email protected]/io.questdb.mp.Worker.run(Worker.java:136)
2025-09-15T13:11:47.1173458Z 2025-09-15T13:11:47.112274Z I server-main QuestDB is shutting down...

QuestDB version:

latest

OS, in case of Docker specify Docker and the Host OS:

Linux (arm)

File System, in case of Docker specify Host File System:

ext4

Full Name:

Vlad Ilyushechenko

Affiliation:

QuestDB

Have you followed Linux, MacOs kernel configuration steps to increase Maximum open files and Maximum virtual memory areas limit?

[x] Yes, I have

Additional context

No response

Sep 15 '25 13:09 bluestreak01

Can I pick it?

Sep 23 '25 12:09 mghildiy

@mghildiy go ahead. what's your idea of a sensible behaviour?

Sep 23 '25 13:09 jerrinot

Memory usage can be tracked(MemoryMXBean). And if a liimt is crossed, application responds to other concurrent requests appropriately, may be saving their progress. At highest level of applicatino, we can catch OutOfMemoryError, do clean up, and exit.

Sep 24 '25 13:09 mghildiy

@mghildiy it's complicated. this is about offheap/native memory exhausted. the problem is when memory is exhausted we cannot allocate anymore. but it appears that we require memory allocation to send a reply back to a user.

fixing this is not non-trivial and I'd recommend to pick something else unless you are familiar with memory management.

Sep 24 '25 15:09 jerrinot

Hello @jerrinot,

I hope you are doing well. I have reviewed this issue thoroughly and would like to respectfully request to be assigned to it. I am confident I can contribute effectively and follow the project’s guidelines.

Thank you for considering my request. 🙏

Best regards, Aditya

Sep 30 '25 17:09 adityagupta0251

Hi, I’d like to work on this issue and submit a PR. Could you please assign it to me?

Oct 01 '25 14:10 1-Arijit-choudhury

hi @1-Arijit-choudhury, @adityagupta0251, it's best to join https://slack.questdb.com/ and ask question about the impl. or perhaps sync between each other - it's the best to join forced.

Oct 01 '25 15:10 jerrinot

Hello @jerrinot I am writing in context with https://github.com/questdb/questdb/issues/6144. The error io.questdb.cairo.CairoException: [-1] global RSS memory limit exceeded means QuestDB's storage engine (Cairo) has enforced a memory cap on the entire process. The server is hitting QuestDB's total process memory limit (RSS), not just the Java heap. The error is triggered by new HTTP connections because the system is already at max memory from other operations. My approach is: Monitoring: I'll set up Prometheus and Grafana to visualize QuestDB's internal memory metrics to see exactly where the pressure is coming from. Reconfigure: Based on the monitoring, I'll increase the cairo.memory.rss.limit in server.conf to a safe value for the server's available RAM. Optimization: We'll investigate for memory-intensive queries and ensure our client applications are using connection pooling to reduce connection churn. Next steps or improvised approach will be taken on the go with the issue. Kindly have a look at my approach and let me kow if I can work on this!

Oct 11 '25 03:10 JunaidAli-dev

Hello @jerrinot I am writing in context with #6144. The error io.questdb.cairo.CairoException: [-1] global RSS memory limit exceeded means QuestDB's storage engine (Cairo) has enforced a memory cap on the entire process. The server is hitting QuestDB's total process memory limit (RSS), not just the Java heap. The error is triggered by new HTTP connections because the system is already at max memory from other operations. My approach is: Monitoring: I'll set up Prometheus and Grafana to visualize QuestDB's internal memory metrics to see exactly where the pressure is coming from. Reconfigure: Based on the monitoring, I'll increase the cairo.memory.rss.limit in server.conf to a safe value for the server's available RAM. Optimization: We'll investigate for memory-intensive queries and ensure our client applications are using connection pooling to reduce connection churn. Next steps or improvised approach will be taken on the go with the issue. Kindly have a look at my approach and let me kow if I can work on this!

Hey Junaid, I think what this needs is that memory issues are handled within program when OOM happens.

What probably this needs is that a off-heap memory chunk is kept specifically for handling it, and when OOM happens , program uses that memory to perform the corrective actions needed to be taken, like sending back responses for ongoing requests, and so on. So it involves unsafe API(or maybe FFM API).

Oct 11 '25 03:10 mghildiy

That's an excellent point. Graceful handling of OOMs at the application level would be a fantastic resilience feature. I'll focus on the operational side to prevent the OOM from happening Please let me know if I can work on this!

Oct 11 '25 03:10 JunaidAli-dev