Practice-Bot
Practice-Bot copied to clipboard
[BUG] AWS CPU Usage Causes Bot Offline
Have yet to diagnose reason, but the bot periodically uses >90% of AWS EC2 Instance CPU (about once every week) causing the instance to crash and bot to go offline until manually rebooted. Nothing is logged as the bot simply just uses too much CPU. Removing HTTP requests seems to make this time between crashes longer but doing so removes key features and only delays the crash.
Crashes from last 2 months
- Sunday 03 January, 2021 15:38:15 UTC; 98.9% CPU Usage
- Thursday 07 January, 2021 17:23:15 UTC; 99.2% CPU Usage
- Monday 18 January, 2021 10:03:15 UTC; 92.4% CPU Usage
- Monday 25 January, 2021 14:18:15 UTC; 99.2% CPU Usage
- Wednesday 03 February, 2021 08:08:15 UTC; 99.7% CPU Usage
- Tuesday 09 February, 2021 08:08:15 UTC; 99.3% CPU Usage
AWS CloudWatch CPU Usage Screenshot

Due to the privacy policy I have not logged any exact commands executed before each crash as doing so would require me to log every command executed by the bot at all times, so I am unsure if it is a certain command error causing these crashes. If the bot goes offline after running a command or event, please report it here.
Downtime has notably increased. Will be attempting to implement sharding to fix this.
EDIT: Running sharded bot on production beta. Will see if this resolves issue.
How is the bot process being executed?
My first thought was to add a service monitor so the process gets rebooted automatically after the crash. At least to decrease the downtime while the issue is investigated
@orendon It's just a Python script that is being run in a tmux window
@kevinjycui Here is a systemd example that I made for another bot https://gist.github.com/orendon/a34d60e6fbe96e5433f60aeb28c9987c
Also you can check into this post for further details https://ma.ttias.be/auto-restart-crashed-service-systemd/
That could be a nice work-around. I tried implementing it a few days back but then it crashed again a few days after. Will look into this further.
@kevinjycui did the systemd approach worked? willing to help on this if you consider it appropriate
@orendon It seems to not have worked since it crashed again a few days ago. It seems like the service file got deleted, so I put it back
# /etc/systemd/system/practice.service
[Unit]
Description=Practice-bot
[Service]
ExecStart=/home/kevin/Practice-Bot/run.sh
Restart=on-failure
RestartSec=5s
[Install]
WantedBy=default.target