apprunner-roadmap
apprunner-roadmap copied to clipboard
Configurable request timeout
Community Note
- Please vote on this issue by adding a π reaction to the original issue to help the community and maintainers prioritize this request
- Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do * not help prioritize the request If you are interested in working on this issue or have submitted a pull request, please leave a comment
Tell us about your request
Currently AppRunner enforces a 30 second timeout on HTTP requests. If the timeout is hit, a HTTP 503 response is returned.
For some use cases, especially large file uploads, this is a severe limitation.
Describe alternatives you've considered
We first refactored our file upload to use S3 multi-part uploads to break the upload request into several shorter requests. But then we ran into the problem that the S3 complete_multipart_upload call can take a long time (several minutes) to complete. We are now considering having the user upload directly to S3, and then notifying the AppRunner service of the completed upload. If that avenue is unsuccessful, we will move to EC2, but that would be a shame as AppRunner has worked well for us so far.
Additional context
AWS support case ID (9433361761)
Attachments
Behavior can be easily triggered by calling something that takes longer than 30 seconds (in this case, a dummy "sleep" route)
> GET /sleep?seconds=31 HTTP/1.1
> Host: xxxx.eu-west-1.awsapprunner.com
> User-Agent: curl/7.64.1
> Accept: */*
>
< HTTP/1.1 503 Service Unavailable
< content-length: 95
< content-type: text/plain
< date: Mon, 03 Jan 2022 07:39:44 GMT
< server: envoy
In addition to π'ing, it seems that this issue is the root cause of other high-demand issues in this repo β referencing them here in an effort to consolidate: #13 #23 #86
This feature request has been open for over a year now (refering to #13). No official comment, no timeline, nothing. Come on.
WIP
Thanks for your patience. What would be a reasonable maximum request time out value that you would like to get supported in App Runner?
Thanks for your patience. What would be a reasonable maximum request time out value that you would like to get supported in App Runner?
Ideally it could go as long as an ALB can. However most HTTP calls I work with are 2 minutes or less (usually much shorter).
Thanks for your patience. What would be a reasonable maximum request time out value that you would like to get supported in App Runner?
Is this something that could still be configurable? Similar to what @jvisker said, ideally this would match the min/max/default timeout of ALB or CloudFront or similar (1/60/30 respectively, IIRC).
For WebSockets and Server-Sent Events, you would also want the connection to remain open as long as data is being sent across that connection βΒ in my first pass using AppRunner and SSE, I noticed that the connection timeout was not being reset even when data was in fact being sent within the timeout period. (Happy to open another issue for that if need be, though I was hoping it would resolve itself once this, #13 and/or #23 are resolved).
+1 for configurable timeout. Maximum value should be at least 5 minutes
Can you tell us more about your use case?
At the moment we use Fargate for legacy applications that can't use lambda due to lambda/apigw limits (timeout/request size etc). We'd like to move them to App Runner. The applications deal with e.g. large file uploads, complex db queries and slow connections.
Hello everyone, Thank you for your feedback and patience on this issue. We have increased the request read timeout in App Runner from 30 seconds to 120 seconds.
We will keep this issue open to continue our conversation as work through increasing the timeout limits and making it configurable for you. Appreciate all your feedback!
Documentation link: https://docs.aws.amazon.com/apprunner/latest/dg/develop.html#develop.considerations
@snnles This is great, but we're still seeing consistent 30s timeouts on our App Runner instances. Is this a slow rollout, or is there something we need to do to explicitly configure this?
Have you done any deployments since this feature launch? This requires a new deployment on your service after the feature launch.
For my use case, I plan on using App runner as the front end of a web app that is loading a rather large CSV file via user input, and continuously sending data in batches to sagemaker for ML inference. So when the SageMaker endpoint completes inference on the given batch, the App runner service collects more data from the CSV, then sends to sagemaker. Because of this, I'd need app runner to reset the request read timeout every time a sagemaker batch completes and the app runner service begins running again. Does app runner currently achieve this? If not, is this applicability in the works?
App Runner provides vcpu during request processing. Each request has a maximum timeout of 2 minutes - https://docs.aws.amazon.com/apprunner/latest/dg/develop.html If your batch processing can finish within the request timeout, it can be done within a single App Runner request. Otherwise it should be orchestrated as multiple requests to the App Runner service.
Are there any plans to increase this limit or make it configurable?
Sam here... I have an API that accepts large file uploads with processing time, and this takes over 2 minutes sometimes. Any update here to make it configurable?
Cross-posting my comment from the Support web sockets issue, as I think it's relevant to the same underlying problem about request timeout.
App Runner currently supports 2 mins of maximum request timeout
Is this the case even with data actively being transmitted? Heroku's approach1 is:
"An application has an initial 30 second window to respond with a single byte back to the client. However, each byte transmitted thereafter (either received from the client or sent by your application) resets a rolling 55 second window. If no data is sent during the 55 second window, the connection will be terminated."
This is great since provided everything is healthy, the client and server can confirm with keepalive pings, keeping the connection open.
Footnotes
If this could be delivered, it would offer us the most flexibility to build to our use cases whether that be websockets or long processing requests, etc.
If this could be implemented in ones code, would a potential work around be to write a while loop with a try except clause that continuously checks to see if the given process is completed? Basically, as long as the python code is running, the server won't time out?
Just for reference, Gcloud Run, which is the main competitor here is 60 minutes, Azure Container instances is infinite, and even AWS lambda is 15 minutes.
Cross-posting my comment from the Support web sockets issue, as I think it's relevant to the same underlying problem about request timeout.
App Runner currently supports 2 mins of maximum request timeout
Is this the case even with data actively being transmitted? Heroku's approach1 is: "An application has an initial 30 second window to respond with a single byte back to the client. However, each byte transmitted thereafter (either received from the client or sent by your application) resets a rolling 55 second window. If no data is sent during the 55 second window, the connection will be terminated." This is great since provided everything is healthy, the client and server can confirm with keepalive pings, keeping the connection open.
Footnotes
If this could be delivered, it would offer us the most flexibility to build to our use cases whether that be websockets or long processing requests, etc.
Heavy +1. It seems this approach would resolve not only this issue, but would help enable #13 and #189 as well.
Are there any plans to increase this limit or make it configurable? @snnles @amitgupta85
The concept that all HTTP calls are short lived is a bit of an architecture ivory tower. I understand the need to protect sockets on load balancers with unhealthy clients and services but at a minimum, the ability to send along a header for certain known long running processes, or the above approach around having traffic reset the window seems completely reasonable and necessary.
Can you tell us more about your use case?
Running a query that takes even more than the updated timeout, 120 seconds, on a Metabase instance that is deployed on AWS App Runner.
Can you tell us more about your use case?
Data import/processing via a front-end system request. Some third party systems do work by having the user to wait based on their input size. When this input is considerable, request do take longer than 2 minutes (waiting for database, backend, etc). For example, a system like this is REDCap.
I also understand we need limits, but could this be as the ALB 4000 sec limit as well?
Similar issue here; running bulk updates via a UI or file imports that just run over 2 minutes, start failing.
I agree, the request timeout is way too short. We have some use cases where we need to run a long queries or perform some synchronous tasks in one web request. 900 seconds would be ideal, 300 seconds at minimum.
@backnol-aws @snnles @jsheld @lazarben @scuw19 @amitgupta85 @akshayram-wolverine
I would think having some parity with Lambda isn't too much to ask.
Exactly my thinking as well, suggesting 900 seconds. Would make moving between Lambda and App Runner via containers very easy!