rate limit for ListInferenceProfiles api
Describe the bug
When use function url (streamResponse) + lambda deploy, encounter this error: Unable to list models: An error occurred (ThrottlingException) when calling the ListInferenceProfiles operation (reached max retries: 1): Too many requests, please wait before trying again. You have sent too many requests. Wait before trying again.
Please complete the following information:
- [x] Which API you used: /chat/completions
- [x] Which model you used: any
To Reproduce build with aws-lambda-adapter:
FROM public.ecr.aws/docker/library/python:3.12.0-slim
COPY --from=public.ecr.aws/awsguru/aws-lambda-adapter:0.9.0 /lambda-adapter /opt/extensions/lambda-adapter
WORKDIR /app
COPY ./requirements.txt /app/requirements.txt
RUN pip install -i https://mirrors.aliyun.com/pypi/simple/ --no-cache-dir --upgrade -r /app/requirements.txt
COPY ./api /app/api
CMD ["uvicorn", "api.app:app", "--port", "8080", "--reload"]
Screenshots
Lambda Web Adapter will repeatlly send HTTP GET requests to your web app during cold start to check if the app is ready. By default, the GET request is send to '/' path. You can change it to bedrock access gateway's health check path '/health'. Just add an environment variable to your function.
AWS_LWA_READINESS_CHECK_PATH: /health
Lambda Web Adapter will repeatlly send HTTP GET requests to your web app during cold start to check if the app is ready. By default, the GET request is send to '/' path. You can change it to bedrock access gateway's health check path '/health'. Just add an environment variable to your function.
AWS_LWA_READINESS_CHECK_PATH: /health
Before using aws-lambda-adapter, I had already implemented the health check required by the adapter, so it shouldn’t be the reason for the issue.