amazon-ssm-agent icon indicating copy to clipboard operation
amazon-ssm-agent copied to clipboard

Feature request: make `maxBackOffInterval` configurable

Open hencrice opened this issue 2 years ago • 15 comments

In some cases, our on-premise hosts boot up without internet connectivity. And once the SSM agent enters hibernate mode, it becomes increasingly harder to make it resume back to active mode even though the connectivity is restored.

Please consider making the maxBackOffInterval below configurable.

https://github.com/aws/amazon-ssm-agent/blob/44665b7ca49ae3d5e302a57d0931edea6d8e4771/agent/hibernation/hibernation.go#L54

Thanks!

hencrice avatar Dec 23 '22 18:12 hencrice

We are having the same issue. Our boxes can be installed without an internet connection. We just ran into a box that was waiting six hours to connect once the internet connection was live because of the exponential backoff. Being able to set a maxBackOffInterval would be awesome.

timharris777 avatar Jan 13 '23 19:01 timharris777

hencrice

We have created a feature request. Please note that we have a backlog of feature requests. We'll prioritize and work on those requests as they come in.

sluggard76 avatar Jan 19 '23 18:01 sluggard76

@sluggard76 , how do we track the progress of the feature request?

timharris777 avatar Jan 20 '23 14:01 timharris777

@sluggard76 can you let us know that status of this feature request? We have an edge device with occasional network failures, and all connectivity except SSM Agent is restored promptly when network connectivity is restored. The exponential backoff takes too long to retry, we need to be able to limit it somehow.

Why did you close this issue as completed if it isn't actually done, doesn't that defeat the purpose of a public issue tracker?

strophy avatar Jun 14 '24 13:06 strophy

I purchased a support subscription with AWS and opened a ticket (ID 171897447000512) to try and determine the status of this feature request, and additionally asked the support agent to ask the development team to stop closing issues as completed when they aren't completed.

@sluggard76 do you still work at AWS? Can you please respond?

Related prematurely closed issues: https://github.com/aws/amazon-ssm-agent/issues/468 https://github.com/aws/amazon-ssm-agent/issues/479

strophy avatar Jul 22 '24 12:07 strophy

ssm-agent 3.3.808.0 was released today and includes a fix for Make long sleep for onprem same as long sleep for EC2, and cap sleep time at 30 minutes for OnPrem instances after successfully requesting the fix via AWS support.

I don't have time to test it now but it looks like this is what we have been asking for, can anyone verify?

strophy avatar Aug 22 '24 13:08 strophy

ssm-agent 3.3.808.0 was released today and includes a fix for Make long sleep for onprem same as long sleep for EC2, and cap sleep time at 30 minutes for OnPrem instances after successfully requesting the fix via AWS support.

I don't have time to test it now but it looks like this is what we have been asking for, can anyone verify?

I can't find any related parameter in the amazon-ssm-agent.json.template file. Does that mean the default cap time has been changed from 24 hours to 30 minutes, but it is still unconfigable?

mochaslave avatar Aug 23 '24 10:08 mochaslave

It looks like it was done in this commit: https://github.com/aws/amazon-ssm-agent/commit/d76f19c96be9d9c88baa14238b5ae467690ecc75

Seems to be mostly changes related to maximum durations and how the backoff is calculated, no variable available.

strophy avatar Aug 27 '24 07:08 strophy

It looks like it was done in this commit: d76f19c

Seems to be mostly changes related to maximum durations and how the backoff is calculated, no variable available.

Then it's not so helpful. :(

mochaslave avatar Aug 27 '24 13:08 mochaslave

See that this is not implemented and valid feature request, reopening.

We will look into implementing this feature and potentially incorporating it into our roadmap.

Aperocky avatar Oct 23 '24 19:10 Aperocky

@Aperocky Is there any ETA on this? Its still actively causing downtime for a lot of our sites because we are stuck with this long backoff duration. No clue why this hasn't been a priority for the past years. It should be super simple to make this at least a configurable parameter.

tdekoning93 avatar Dec 09 '24 10:12 tdekoning93

@Aperocky @sluggard76 can you please reply? Or should I open another support ticket to ask your manager to ask you for a reply?

Not being able to configure this caused over 200 litres of water to spill on our factory floor because we couldn't access a pump controller due to this annoying issue. Your paying users have been asking for this feature since September 2022. Why is this so difficult?

strophy avatar Mar 17 '25 10:03 strophy

@strophy I'm sorry this happened - we had some priority crunch and I'm just back from parental leave. I'll take a look and hopefully this will be a quick fix and update.

Aperocky avatar Mar 25 '25 21:03 Aperocky

@Aperocky fully understand how life can get in the way of things! Thanks very much for the update, please keep us posted in this thread with your progress 🚀

strophy avatar Mar 25 '25 21:03 strophy

We are making some minor bug fix/update to hibernation and will allow configurable backoff intervals, this change should deploy with the next agent release.

Aperocky avatar Mar 27 '25 20:03 Aperocky

This is now configurable with the release 3.3.2299.0 in https://github.com/aws/amazon-ssm-agent/blob/mainline/amazon-ssm-agent.json.template#L18

Aperocky avatar Apr 15 '25 17:04 Aperocky