mars
mars copied to clipboard
Use queues instead of direct allocation in workers
Currently, Mars worker allocates CPUs for tasks via DispatchActor, who allocates CPUs in an exclusive manner, that is, when a CPU is allocated to a task, it is removed from a pool. There is no problem with this mechanism when calculation is started immediately. However, currently in Mars, we load data which cannot be loaded into shared memory into process memory before actually starting calculation. This can waste CPU resource when data are loaded into process memory.
We may use a new CPU allocation strategy by introducing queues for slots instead of allocating them directly. When a task is queued, it starts loading data into a calculation process if needed, and waits for the process to be ready before actually starts execution. The queued tasks who finish data preparation are executed first.
Possible influences of this strategy on existing quota mechanism should be considered. What's more, whether this mechanism works in real-world tasks should be studied as well.