[๐งน CHORE]: Autoscale. First look, bugs, proposals
No duplicates ๐ฅฒ.
- [X] I have searched for a similar issue.
What should be improved or cleaned up?
Starting with 2024.3.0 we have autoscale workers ๐ It's a useful feature and, of course, I immediately went to test it out. After a little discussion in Discord (https://discord.com/channels/538114875570913290/1314816983090593803), we came to the conclusion that some things still need to be improved.
1. allocate_timeout is redundant at the moment.
Before autoscale, allocate_timeout was responsible for the startup timeout of the worker. (https://docs.roadrunner.dev/docs/error-codes/allocate-timeout)
Now allocate_timeout is also used as a debounce when spawning new workers in autoscale. I.e. before the EventNoFreeWorkers fire the pool waits for allocate_timeout and only then adds workers.
The obvious problem is that these should be different options in the configuration, since the timeout for creating a new worker and the delay between creating new workers in the pool are different values. Default allocate_timeout is 60s, for workers startup it might be okay. but not for timeout before allocating new dynamic workers in the pool. its too long. For example, if all workers in working status and we have new lightweight request from user, user will wait allocate_timeout (60 seconds) before pool spawn new workers for users request.
It is suggested that allocate_timeout be split into two options.
allocate_timeout, exactly what it was before.dynamic_allocator.debounce_timeout, the waiting time when all the wokers are inworkingstatus before theEventNoFreeWorkersevent.debounce_timeoutworking title, it may be different.
Questions for the community:
- Name of
debounce_timeout? - Any suggestions and comments.
2. Sometime need to spawn new workers before EventNoFreeWorkers
If our workers have long-time warmup, like need to open big SQLite, or load AI model, etc, we want to spawn new workers in advance. We're ready for overhead, just as long as it's delay-free for the user.
In this case, we want to control spawn new workers before fired EventNoFreeWorkers, for example, when there are less than 2 free workers (status ready).
It is suggested that new options dynamic_allocator.min_ready_workers (working title). If we have min_ready_workers: 2 and in pool we have less than 2 workers in ready status, pool fired EventMinReadyWorkers and spawn new workers from configuration. Of course, the EventMinReadyWorkers event should fire with the debounce_timeout.
Questions for the community:
- Name of
min_ready_workers? - Need new event like
EventMinReadyWorkers, or just fireEventNoFreeWorkers? - Any suggestions and comments.
Bugs:
- https://github.com/roadrunner-server/roadrunner/issues/2092
- #2111
Hey @trin4ik ๐๐ป
Thank you for the valuable feedback ๐๐ป I agree about separating allocate_timeout into 2 options, good old allocate_timeout + allocate specific option for the dynamic pool timeout.
2. MinReadyWorkers is also interesting idea, I need to double check that these new options won't involve on performance, because, you know, why to use RR if it'd be slower than FPM ๐
What is the point in waiting at all? If no workers are available, I would argue you should create a new one immediately and only use the allocate_timeout logic once the maximum number of dynamic workers is reached.
An application may work just fine when all workers are fully loaded and timeout to wait within a few hundred milliseconds. Furthermore, there is no such a single thing called - no workers available. Many threads can wait for a worker and if we allocate them immediately, you may see a huge spike of 100 (or max_workers number) of workers allocated at the same time (maximum for the dynamic allocator).
"No workers available" is just the condition where no worker is idle. Isn't that the point of dynamically scaling workers? To allocate additional workers when they are all busy?
If you set your max_workers to 100, you would expect to be able to handle 100 workers, so what's the problem? This is what FPM does and it has very similar worker scaling if you use pm = dynamic mode (https://www.php.net/manual/en/install.fpm.configuration.php). Maybe there is a small delay with FPM for performance reasons, but it's definitely not measured in seconds.
I don't know about the internals of how this works, but waiting more than 50-100ms for a worker seems to defeat the purpose of auto-scaling.
Maybe you could consider "scale-in-delay" as a parameter. So it will wait minimum x ms between starting new workers. This way you get immediate response for the first dynamic worker but avoid the spike you mentioned.
I don't know about the internals of how this works, but waiting more than 50-100ms for a worker seems to defeat the purpose of auto-scaling.
yes, which is why it's suggested that we split the timeouts. allocate_timeout is too large for debouncing the start of new workers. but I don't see the point of removing the debounce completely.
I don't know about the internals of how this works, but waiting more than 50-100ms for a worker seems to defeat the purpose of auto-scaling.
yes, which is why it's suggested that we split the timeouts.
allocate_timeoutis too large for debouncing the start of new workers. but I don't see the point of removing the debounce completely.
Well, the point is that there is no reason to wait at all. If you have saturated your worker pool but your CPU is doing nothing because you're waiting for IO (or whatever - but something typical for web applications), there is no reason to wait. If a delay is necessary for some technical reason, it should be set very low by default.
Well, the point is that there is no reason to wait at all. If you have saturated your worker pool but your CPU is doing nothing because you're waiting for IO (or whatever - but something typical for web applications), there is no reason to wait. If a delay is necessary for some technical reason, it should be set very low by default.
yes, its true, delay should be very slow and that's exactly what this ishue is about.
zerodelay would lead to an overhead, it seems to me.
example: hiload service with many requests, default workers pool 10, spawn_rate 5. every time all workers will be busy for at least 1ms, rr will create +5 workers. expecting at least 50ms is a good solution, at least in my production I would like to see it. the other thing is that the allocate_timeout chosen is certainly not suitable for this.
But why is it a problem creating 5 more workers if all 10 are busy? For this example to make sense, you'd have to receive 15 requests in < 50ms. That's a very tight gap I'd say if the traffic pattern won't continue at all (and hence still require a lot of workers). But I understand your point. I think maybe you can get the best of both worlds with two configs:
Something like:
worker_max_spawn_rate: 2
worker_spawn_delay: 50ms
So every 50ms you will be allowed to create up to 2 more workers, if still necessary. This way you'd would immediately go from 10 to 12 in your example, but not from 12 to 14 before 50ms had passed.
Edit: Okay it seems we already have spawn rate. So yea, I guess we agree. I didn't read the docs at all.
i dont want to limit spawning workers by timeout, i want spawn workers (and the allocation of resources, which can be impressive) only after timeout like 50ms. if after 50ms the workers are also busy, then new wokers should be spawned.
its my case. another case, which I also described, raises the issue of the long start time of workers. when it is a good idea to start creating new workers before all the current ones are busy.
We really like the dynamic scaling feature and are using it in production with Symfony framework running Kubernetes.
Currently it's really hard to find ideal value for allocate_timeout due to following reasons:
- too low value causes first boot of application and workers to fail since Symfony cache takes time to warm up and when under high stress of the cluster
- too low value makes requests fail during load spike longer than allocate_timeout duration and workers are already scaled to the max
- too high value forces the clients to wait for the duration of allocate_timeout even during smaller and temporary spikes
Edit: hopefully better wording
I think that for http request processing it works to think in allocate_timeout terms. For jobs from queues, though, I think it is more useful to think in how many messages are left to process. For example, if you measure and you know that your consumer can process about 10 msg/s and you set a budget of processing time, then you could set spawn_rate = 10. Meaning that you spawn one worker per 10 msgs in the pipeline. Thus, keeping up with the messages as they get consumed from the queue.