convex-backend
convex-backend copied to clipboard
Random unrecoverable OOM crash
I'm running the latest convex (v1.24.1) in a self hosted manner on fly.io. I set things up following this guide. The convex backend randomly crashes with an OOM and this has happened frequently. My app is really not that complex, I have 2 cron.ts jobs and then a simple schema with basic CRUD operations. In the below chart you'll see it crashes randomly in the middle of the night without much traffic or anything. It also cannot recover from that. If I take the volume where al the data is stored and attach it to a VM with a larger memory it also crashes immediately, so I can't even recover my data. I'm probably doing something wrong but would love some help on debugging it. Are there any common pitfalls or has this happened to other before.
I can share more detailed logs as well. If Discord is the better medium for this let me know.
Thanks!
Yeah - you're more likely to get more eyes to help you debug on discord from the whole community.
For OOM debugging - you might want to try again with a larger VM, look at log files around the time of the crash, and look for operations that run around the time of the failure (eg crons etc).