Issue with concurrent requests on AWS Fargate
Describe the Bug I am encountering an issue where concurrent requests are being processed sequentially rather than simultaneously when deployed on AWS Fargate. I suspect the problem is that boto3 runs synchronously, and its calls are blocking.
API Details
- API Used: /chat/completions
- Model Used: all of them
To Reproduce Steps to reproduce the behavior:
- Deploy the service on AWS Fargate following the standard setup procedures.
- Send multiple concurrent requests (e.g., 10 concurrent requests) to the API.
- Observe that the requests are processed sequentially instead of concurrently.
Expected Behavior I expected that when sending multiple concurrent requests to the API, all requests would be handled simultaneously or at least as many as the server can handle
Concurrency and asynchronous call is natively supportted by FastAPI, I did a quick test with 2 concurrency requests (with long response) and I can see both are streaming in parallel, I didn't test via code though.
You can probably try below:
- Try fewer requests (like 2 requests) first and see if the issue still exists.
- Try to test in local (The code can run locally)
- Try to increase the capacity of Fargate (By default, it has only 1 core, I would expect it may not support larger concurrent requests) and retest
Hi @daixba, I forgot to mention that I'm not streaming the response With streaming, it works better, but it is still not perfect (I monitor the health-check endpoint, and it times out from time to time)
But without streaming, the API is waiting for each request to finish before being able to handle other requests
Concurrency and asynchronous call is natively supported by FastAPI
I agree; This is why I think the problem with boto3
@daixba when I run boto3 with asyncio it's working as expected https://github.com/aws-samples/bedrock-access-gateway/pull/23
所以这个能解决吗,我的大并发请求一遇到非流式就没办法
This is not a problem with Fargate's capacity, it's due to the fact that we're using block code in the loop.
Let me explain this: in fastapi, an async handler runs in a loop, a sync handler is wrapped to a thread and then runs in a loop
Therefore, when there is blocking code in an async handler, it will block the whole server.
Usually, we all understand the following code:
import time
import asyncio
from fastapi import FastAPI
app = FastAPI()
@app.get("/")
async def root():
await asyncio.sleep(1000) # Won't block
time.sleep(1000) # Will block
return {"message": "Hello World"}
Yet
import time
import asyncio
from fastapi import FastAPI
app = FastAPI()
@app.get("/")
def root():
time.sleep(1000) # Won't block!
return {"message": "Hello World"}
@QingyeSC If you're in a hurry, you can build the image yourself from https://github.com/aws-samples/bedrock-access-gateway/pull/23. If you want selfhost, I forked my version https://github.com/Wh1isper/bedway
Also ran into this issue, fixed it by subclassing the BedrockModel class as AsyncBedrockModel in a separate module and adding aioboto3 support to keep the syntax similar and not touch the main code to allow pulling from upstream easier when needed.
Hopefully pull #23 gets approved though 👍
Sorry, it's been a long time to address this issue.
Now the performance is improved based on my test. Now this project make async call to converse api. We don't need to use aioboto3 here. Check the 0ead770069a47a3342e68096a29e815f08567687 for more details
Simply redeploy or update the container image to have a try!
Please let me know if any feedbacks.