apprunner-roadmap
apprunner-roadmap copied to clipboard
Apprunner hangs on long running requests with error message "upstream connect error or disconnect/reset before headers. reset reason: connection termination"
Community Note
- Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
- Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do * not help prioritize the request If you are interested in working on this issue or have submitted a pull request, please leave a comment
Tell us about your request What do you want us to build? When running an app with a simple frontend but a long running backend the web client will suddenly hang with the error message "upstream connect error or disconnect/reset before headers. reset reason: connection termination" This doesn't seem to effect the app itself since I see in cloudwatch the logs keep behaving as if the request is being processed. This is probably a simple timeout configuration in the load balancer or the API Gateway (if there is one). Was wondering if you could add the option to configure this timeout, or at least provide visibility into what the configuration is.
Describe alternatives you've considered Use ECS where you can configure these parameters directly
Additional context Anything else we should know?
Attachments If you think you might have additional information that you'd like to include via an attachment, please do - we'll take a look. (Remember to remove any personally-identifiable information.)
I was able to fix this issue by pausing and resuming the service.
I already have this issue again today. Do we know anyone we can tag here to get some attention from AWS?
Tag #104
happening to me right now. after pausing and resuming still no fix. :/
Same issue. No clear indication of what is going on anywhere. Is this a resource issue? Is there a problem with App Runner? None of this happens anywhere else we're running these containers, what's going on?
I am experiencing the same issue! Even if I reduce the processing time, the 503 gets hit.
I have contacted AWS support about this, but so far the issue has been "Work in progress" for 8 days. I hope they get back to me about this soon.
I have contacted AWS support about this, but so far the issue has been "Work in progress" for 8 days. I hope they get back to me about this soon.
@francoisvdv May I know if you've got any replies back from the AWS Support regarding this issue?
Sadly only a 'we are working on it and we have escalated it' but no solution or anything..
@francoisvdv still nothing?
After various back-and-forths the conclusion of AWS support was that it was a problem in the application. We did not agree with that conclusion and instead migrated away from App Runner to ECS. So sadly no solution other than not using App Runner.
Had this same issue about 1 year ago when testing out AWS AppRunner, and now experiencing the same issue again but not quite as often as 1 year ago, but still 😞 ... Will have to move back to ECS Fargate again.
Below is the reply I got from AWS Support July 3. 2021.
Hello,
After further investigation from the service team,
App Runner uses Fargate tasks in the backend to spinup the application instances. When the the application is not receiving any requests, Fargate automatically reduces the CPU allocated to the task. (Idle State) Once there are new active requests, task CPU allocation increases to be able to respond to incoming requests (Active State).
The issue is related to Fargate task not getting allocated CPU even after receiving new requests.
Backend Fargate tasks are put to sleep since they are not receiving any active requests for extended period of time. New incoming requests might face network timeout issues leading to 503's. Since the Fargate task is not being able to re-allocate >CPU for serving new incoming requests.
Unfortunately, there is no way to mitigate this issue at that point.
The internal service team are working to find a solution. However, it may take some time.
You can track any new release information at either of the following locations[1][2].
References: [1] https://aws.amazon.com/new/ [2] https://github.com/aws/apprunner-roadmap/issues
The active auto scaling policy for AppRunner service (screenshot from AWS UI):
I think this scaling policy ☝️ with Minimum size
configured and the AWS Support response
Backend Fargate tasks are put to sleep since they are not receiving any active requests
Is contradicting/misleading (if true) as I would expect to have 5 instances ready/active at all times (with CPU) to handle incoming requests.
Still getting this. When is a solution expected?? This is rendering AppRunner completely unusable. The whole point is to abstract scaling, but then the thing is incapable of scaling altogether?! What is the point? Why did you even launch this service?
use nginx proxy it will solve the issue, for mine it solved by using nginx.
Sorry for no response here for a long time on this issue. I would like to help on this. @Mugane @mikaelcabot @francoisvdv Is it possible that you can share service ARN for an affected service?
Sorry for no response here for a long time on this issue. I would like to help on this. @Mugane @mikaelcabot @francoisvdv Is it possible that you can share service ARN for an affected service?
We moved away from AppRunner because of this issue so we no longer have ARNs available.
Yes, but I'm not at my workstation, I'll update tomorrow
On Fri, Nov 18, 2022, 5:02 AM Francois van der Ven @.***> wrote:
Sorry for no response here for a long time on this issue. I would like to help on this. @Mugane https://github.com/Mugane @mikaelcabot https://github.com/mikaelcabot @francoisvdv https://github.com/francoisvdv Is it possible that you can share service ARN for an affected service?
We moved away from AppRunner because of this issue so we no longer have ARNs available.
— Reply to this email directly, view it on GitHub https://github.com/aws/apprunner-roadmap/issues/92#issuecomment-1319785932, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADDHIZ34PRJFQOZLI3AADLTWI5H5DANCNFSM5I7PUIAQ . You are receiving this because you were mentioned.Message ID: @.***>
I am experiencing the same issue when running a NextJS app on AppRunner. The Issue is happening when I run Google PageSpeed insights against the app. It is a shame really because it turns out that AppRunner is actually not scalling well enough and there is nothing I can do as a user. No matter what 'scaling policy' I use or what vCpu/Ram configuration the app is crashing from a simple google page speed test....
hi @mstoyanovv try to use nginx as a proxy it will resolve the issue
hi @SJANAKIVENKATA, how did you use nginx with AppRunner?
hi @mstoyanovv just use it for proxy only and static files, no need to configure certificate ehy because apprunner will provide https.
hi @mstoyanovv try to use nginx as a proxy it will resolve the issue
How would this possibly make any difference? Wouldn't it just offload the error from the initial request to the internal proxy request? That doesn't solve apprunner hanging.
@Mugane I created another AppRunner instance that hosts Nginx configured as proxy and cache of static files. It solved the issue that I had with google PageSpeed Insights. Also, when stress testing the app with Ddosify it does handle the traffic better.
Hello @mstoyanovv, could you provide the service arn so that we can take a look?
Sorry for no response here for a long time on this issue. I would like to help on this. @Mugane @mikaelcabot @francoisvdv Is it possible that you can share service ARN for an affected service?
We have also moved away from AppRunner because of this issue.
But going cack to the response I got from AWS Support
The issue is related to Fargate task not getting allocated CPU even after receiving new requests.
Unfortunately, there is no way to mitigate this issue at that point. The internal service team are working to find a solution. However, it may take some time.
... So has a fix been applied targeting this issue? (Asking to know if it's worth spending time on testing this again).
Hello @mstoyanovv, could you provide the service arn so that we can take a look?
where can I contact you @smeera381 ?
Hello @mstoyanovv If you could share the service arn here, I can take a look.
we are having the same issue on a prod app with NextJS.
With APIGW works and with App Runner it does not.
Not testing with google but with our own nextjs website. Sometime it hangs and we cant do anything about it.
request latency went up at 18:30 UTC
and we start having same errors
Thank you @mstoyanovv. Taking a look. Hello @atrope, please feel free to share your service arn details here and we will take a look.
arn:aws:apprunner:us-east-1:384537834093:service/genuine-project-ffub8-app/e0d044541e0c43b38894b06e88c3b36c