Add configurable warmup delay
In https://github.com/brefphp/bref/pull/734 a feature got merged that adds a delay to the warmup function in order to have multiple warm lambda's.
The wanted outcome is that when you send n concurrent warmup requests you end up with n warm lambda containers.
It's clunky and it could be argued it isn't needed in production. However, we are working on an Api-Platform application. Part of their cacheability strong suits are you should only return identifiers in collections ( GET /api/users ) and afterwards make individual requests for the actual items.
So when I request a list of 100 users our frontend gets to make 101 requests.
This causes a lot of concurrent requests and is (initially) slow ~~as fuck~~ when you are using Bref in a cold development environment.
Originally https://github.com/brefphp/bref/pull/734 had the option to set an env var to define the warmup delay, essentially allowing people to configure their concurrent warmup. However @mnapoli understandably decided to hardcode it to 10ms because Bref likes to limit the amount of options.
There has been no follow-up issue from @peppeocchi so it's very possible the 10ms has been enough for him. Or he didn't bother and found another solution.
I'm here to get the conversation going again 😅
Can we get a configurable warmup delay?
Thanks for all the details!
As always, I like to understand first why we need this 😄 You explained the context very well, but is 10ms not working for you? Would any other number work better? (because I randomly chose 10ms because there were no recommendation from experience that another number would work better)
If we definitely can't find a solution that works for everyone, we can of course add an option for that. But I like to make sure we are really certain it's worth it :p
Honestly we didn't do any benchmarking to see where we land with the current 10ms. I'll try to get you some numbers so we can make an informed decision 👍
@RobinHoutevelts @mnapoli to be honest I didn't follow up on the issue because the 10ms allowed me to have at least always 2 warm containers. With a concurrency of 5. Also the app that needs warmup is a less frequently accessed microservice that's deals with payments (so it's slow anyway) and I didn't bother looking for other solutions. But I can tell that if I had to keep warm a customer front-facing app the 10ms wouldn't be enough (talking about keeping 50 warm lambda at all time). So to get to those numbers you should set your warmer function to call it about 300 times every 5 minutes. I did some tests at the time, I don't have the numbers in front of me, but I think I wasn't able to get more than 30 warm lambdas no matter what the concurrency was set on the warmup function. For other apps I am bypassing the default delay by calling the api gateway endpoint that has the delay I want (this of course is a more expensive solution).
The reason is very simple: Your warmup function will call X number of times your lambda function. Each call will take about 5ms before calling the next (of course it calls async the function to warmup - even if it was only 1ms it will warm up 10 lambdas and then lambda will start to be "free" to handle more incoming executions), so practically it will warm up only a very low number of containers. With small numbers is less noticeable, but when you need lots of them then you will be disappointed with the result.
I believe I was getting up to 50 warm lambdas with the 25ms I suggested on the other PR.
I think this should be a configurable option, can be defaulted to 10ms, but to be cost effective it must be customisable.
To give an examples: desired capacity: 50 warm functions. at the moment you might get to that number with a concurrency of 300 (probably need to increase the delay to 25ms). if you increase the delay to 75ms, you might need to call it 200 times (saving 100 executions) and still keep the same base cost. it might even be more cost effective by setting the delay to 150ms but set the warmer to concurrency: 80. this is more likely to get you the 50 warm functions.
Then at the end of the day having a configuration for the delay might just get you where you need to be. If you need to always have 100 warm lambdas you can either buy the very expensive "Provisioned concurrency" option that will add a base cost of about 300$/month (+ executions cost), or have a delay of 450ms that will add about 15$/month (for 200 executions every 5 minutes for a 1GB function, running for up to 500ms). then it's always a balance of what you really need. for example one of our busiest microservice (python function handling ~100 req/s) set with a concurrency of 60 and a delay of 90ms ensures us to have always 35 warm functions (it takes about 1s to call 60 times the lambda function to warmup)
One thing is sure, you will never be able to get to 100 warm lambdas with only 10ms delay.
Thank you for those details, I finally understand the problem!
And if I understand correctly, the problem stems from the inability of the "warmup script" (the thing that invokes the function in parallel) to actually send requests in parallel?
Stupid question: can we fix that at the root, i.e. write a script/program/whatever that manages to invoke a Lambda X times truly in parallel? Is that impossible to send 100 HTTP requests in less than 10ms?
(if it was doable, that sounds to me like the best solution: lowest costs, lowest waste of resources)
@mnapoli I believe it's not doable, or at least I am not aware of any way of doing it from a single function. The warmer plugins uses the aws sdk to invoke the lambda to warmup. Maybe creating 100 different warmer functions with the same cloudwatch event? Not even sure if it will work, and to be honest it will be ugly and probably more costly. With HTTP requests there are many things that might affect the connection, so even if you can fire off 100 requests in 10ms, you won't be sure that all the 100 requests will be received within the 10ms.
@RobinHoutevelts the other option, if no customisable delay gets added to bref, is to do an HTTP request to your lambda (via the API gateway) to an endpoint you control and add any delay you need, though this will cost you something more and you'll have to write your custom warmer plugin/function to make the HTTP requests instead of invoking directly the lambda.
In order to keep X lambdas warm, you would need to make X invocations, while first response you receive is after you sent the last request. Aka. Lambdas need to be working on returning response while you invoke new ones.
What you could do:
- Create new Lambda that sends X (let's say 100) HTTP requests at once (for-loop / Promise.all, sth like that), for simplicity, let's say it will take 500ms to send all these calls,
- Create endpoint on Bref (for example
/api/warmup) that will take 1000ms to response. This way, you get enough time to send all those HTTP calls, - Create Cloudwatch Event to invoke our Lambda once in a Yms. So you control how often those 100 HTTP calls are send.
This is based on 2yo analysis we did for similar usecase.
Generally, we landed on keeping single Lambda warm the gain of keeping multiple ones warm resulted in very little gain for way higher complexity and cost. Definitely recommend doing own benchmark, overall, if you have use-case where every ms counts, just get reserved concurrency :)
Hope it helps.
EDIT:
So when I request a list of 100 users our frontend gets to make 101 requests.
This seems like a wrong design of your application really. Frontend sending 100 concurrent requests is questionable.