convex-backend icon indicating copy to clipboard operation
convex-backend copied to clipboard

Random unrecoverable OOM crash

Open raphjaph opened this issue 6 months ago • 1 comments

I'm running the latest convex (v1.24.1) in a self hosted manner on fly.io. I set things up following this guide. The convex backend randomly crashes with an OOM and this has happened frequently. My app is really not that complex, I have 2 cron.ts jobs and then a simple schema with basic CRUD operations. In the below chart you'll see it crashes randomly in the middle of the night without much traffic or anything. It also cannot recover from that. If I take the volume where al the data is stored and attach it to a VM with a larger memory it also crashes immediately, so I can't even recover my data. I'm probably doing something wrong but would love some help on debugging it. Are there any common pitfalls or has this happened to other before.

I can share more detailed logs as well. If Discord is the better medium for this let me know.

Thanks!

Image

raphjaph avatar May 22 '25 03:05 raphjaph

Yeah - you're more likely to get more eyes to help you debug on discord from the whole community.

For OOM debugging - you might want to try again with a larger VM, look at log files around the time of the crash, and look for operations that run around the time of the failure (eg crons etc).

nipunn1313 avatar May 22 '25 20:05 nipunn1313