amazon-ssm-agent
amazon-ssm-agent copied to clipboard
ssm-agent-worker max CPU usage at boot (infrequent)
Sometimes, the ssm-agent-worker gets stuck consuming all CPU resources (e.g. 179%+ on a 2-vCPU instance) after reboot.
I'm not sure what's helpful, but I'm attaching the logs I have, except that I cut journalctl output down to the lines containing amazon-ssm
only. The instance is run in America/New_York (GMT -05:00 currently) after initial configuration, and timestamps appear to be local.
I used top to send signal 15, then signal 9, to the worker (the first did not work) and the service did not appear to notice, so I restarted the whole snap.amazon-ssm-agent.amazon-ssm-agent.service
service after a few more seconds (plus time it took to even find that name.)
This AMI is customized, of course, but ultimately derives from the current Ubuntu EC2 releases listing, us-east-1 20.04 amd64.
Attachment: logs.zip
I'm experiencing the same symptom, although much more frequently of late. this is impacting ~20-25% of newly launched instances
@svoeller99 You are seeing the issue on Ubuntu as well?
yes - we're running Ubuntu 18.04
I am seeing the same issue on Ubuntu 20.04
I have this same problem, really high CPU usage by amazon-ssm-agent, the machine gets totally unstable and there is no explanation.
What are we supposed to do ???
Hi @tr4g We had same issue and am keeping an eye on this thread and saw your comment. Could you please let us know what OS and Kernel version you are seeing the problem in? Thank you!
+1 same issue ...
Windows Server 2019 - 4vCPU - SAME ISSUE.
@tr4g @sapphirecat @svoeller99 @thecarnie @saraiyakush Thanks for reaching us. Sorry for the delay in response. Are we seeing this issue with the latest agent now?
@VishnuKarthikRavindran It was somewhat rare, happening maybe once every month or two, launching on average 1.2 instances per day. (Just infrequently enough that I never built a script to automatically handle the situation.) It hasn't happened again for me since I filed the issue, but I can't say with confidence that it's fixed.
We have continued to track the latest Ubuntu 20.04 AMI, so we should be getting both agent and kernel updates accordingly.
@VishnuKarthikRavindran for me is happening still with version 3.0.1124.0
on Ubuntu 20.04:
snap info amazon-ssm-agent
name: amazon-ssm-agent
summary: Agent to enable remote management of your Amazon EC2 instance configuration
publisher: Amazon Web Services (aws✓)
store-url: https://snapcraft.io/amazon-ssm-agent
contact: https://aws.amazon.com/contact-us/
license: unset
description: |
The SSM Agent runs on EC2 instances and enables you to quickly and easily
execute remote commands or scripts against one or more instances. The agent
uses SSM documents. When you execute a command, the agent on the instance
processes the document and configures the instance as specified. Currently,
the SSM Agent and Run Command enable you to quickly run Shell scripts on an
instance using the AWS-RunShellScript SSM document.
commands:
- amazon-ssm-agent.ssm-cli
services:
amazon-ssm-agent: simple, enabled, inactive
snap-id: T09mpujiTnzSdSCuqNkE7YXXTWDq13tC
tracking: latest/stable/ubuntu-20.04
refresh-date: yesterday at 18:01 UTC
channels:
latest/stable: 3.0.1124.0 2021-07-29 (4046) 26MB classic
latest/candidate: 3.1.192.0 2021-08-19 (4662) 27MB classic
latest/beta: ↑
latest/edge: ↑
installed: 3.0.1124.0 (4046) 26MB classic
Hi @radykal-com, Is this issue reproducible on your end? If possible, could you please check whether you are seeing this with the latest version? Thanks
Well, its's not easy to reproduce, as it happens randomly with very low frequency. It happened to 6 or 7 instances over a total of 100+. When it happens it happens from the moment the instance starts. I decided to just uninstall it from our AMIs
Thanks @radykal-com for reaching us. We have done many improvements in the latest SSM agent versions. Please let us know if the issue persists with the latest one if you think of using the agent any time.
+1 here, Ubuntu 20.04, every 10 mins or so only running simple website in nginx docker on t2.mirco. Locks entire system 100% CPU for about 5 mins. Tried rebooting via console and on the cli.
This is pretty unacceptable and am interested in possibly receiving refund on my 3 reserved instances, how would I start that process so I can move to a more stable cloud server?
Hi @WinterTFG, Sorry to hear about that. Could you please share us the repro steps if it is reproducible on your end?
Like said above, we have done many improvements in the latest SSM agent versions. If possible, could you run with the latest one. Thanks.
I'm seeing similar behaviour on the latest:
summary: Agent to enable remote management of your Amazon EC2 instance configuration
publisher: Amazon Web Services (aws✓)
store-url: https://snapcraft.io/amazon-ssm-agent
license: unset
description: |
The SSM Agent runs on EC2 instances and enables you to quickly and easily
execute remote commands or scripts against one or more instances. The agent
uses SSM documents. When you execute a command, the agent on the instance
processes the document and configures the instance as specified. Currently,
the SSM Agent and Run Command enable you to quickly run Shell scripts on an
instance using the AWS-RunShellScript SSM document.
commands:
- amazon-ssm-agent.ssm-cli
services:
amazon-ssm-agent: simple, enabled, active
snap-id: T09mpujiTnzSdSCuqNkE7YXXTWDq13tC
tracking: latest/stable/ubuntu-20.04
refresh-date: 18 days ago, at 01:03 CEST
channels:
latest/stable: 3.0.1124.0 2021-07-29 (4046) 26MB classic
latest/candidate: 3.1.282.0 2021-09-09 (4750) 27MB classic
latest/beta: ↑
latest/edge: ↑
installed: 3.0.1124.0 (4046) 26MB classic
It was stale for 156 hours, and was eating 300% CPU.
Hi @mkdotam, It looks like the installed agent version is 3.0.1124.0. Could you please check whether you are seeing this with latest version - 3.1.282.0? Thanks
Still happening using snap version 3.1.338.0
. I'm running ubuntu-focal-20.04-arm64. Happened twice just today
Hi @Whale-Observer-App, May I know how did you reproduce this one? Also could you please attach the logs if possible. Thanks.
just rebooted 2nd time today, amazon-ssm-agent, revision 4046
This has started happening to me today. 50%+ consistent. New servers, windows, elastic beanstalk created the server. I created a dump file of the process if that can help. IIS 10.0 running on 64bit Windows Server 2016/2.8.0
I have the same here, since months ago one of my instances gets ssm-agent randomly peaking CPU to a point that it's not even accessible anymore.
AWS Ubuntu 20.04 amazon-ssm-agent 3.0.1124.0
add swap as 1st step after creating instance:
https://aws.amazon.com/premiumsupport/knowledge-center/ec2-memory-swap-file/
since this I never have issue with agent
Thanks for reaching us again. We were able to reproduce this issue on our end. The fix was given in the following agent release https://github.com/aws/amazon-ssm-agent/releases/tag/3.1.426.0. Could you all please try updating to the latest one?
Thanks! I just updated it. Given that the issue is pretty random, I can't immediately test. But I'll keep monitoring in the upcoming days.
I was dealing with this problem imagining it was some reaction to my code. However, after 3 days without much success I decided to try to send my code to another machine with the same operating system. It worked. My code stopped dying with this CPU spike coming from the ssm agent.
Ubuntu 20.04 Amazon ssm agent 4046
I don't know why this issue was closed if no solution was given. I'm experiencing it as well
Today the same situation. Can't even login to console normally due to excessive load. la is over 15
You can simply solve this problem by running , sudo snap remove amazon-ssm-agent
You can find the full answer here. https://docs.aws.amazon.com/systems-manager/latest/userguide/sysman-uninstall-agent.html