juno
juno copied to clipboard
OOM Crashes on Juno Pod After Restart During Heavy Load
Increased traffic targeting the starknet_call method on our k8s pod pushed CPU usage to 100%, leading to request failures and block sync issues. Subsequent restarts of the pod resulted in immediate OOM errors at startup. However, after applying a fresh database, the pod started to sync properly without any OOM issues which suggests that db has been corrupted(?).
k8s Logs:
terminated
Reason: OOMKilled - exit code: 137
Started at: 2024-04-19T15:14:04+05:30
Finished at: 2024-04-19T15:14:51+05:30
Possible Causes:
- Potential database corruption during restarts combined with high CPU load.
- Recent Pebble updates
//UPDATE - 06.05.2024 Pod unable to keep up with syncing, resulting in failed requests due to reaching CPU limit. Actions taken: Added more pods, restarted pod, but no improvement. Resolution: Removing and replacing the DB resolved the issue. Next steps: Prioritize investigating and fixing the underlying cause.