Helix icon indicating copy to clipboard operation
Helix copied to clipboard

Recursively recalculate TOPs that may be affected by another TOP's recalculation

Open renatomassaro opened this issue 7 years ago • 3 comments

Imagine the following scenario:

A is downloading a file from B. C is uploading a file to A. D is uploading a file to C. We have something like: D -> C -> A <- B.

Now image A's download completed, or got cancelled. It means A has more available DLK, and possibly C could allocate more ULK resources to it. If this happens, D's upload may also be modified.

Current TOP (implemented at #322) only deals with a process' gateway and remote, so in this case, a change on B would lead to a recalculation of A and B, not C and D. This is what this issue is supposed to solve.

It may seem complex, but the solution is quite simple.

  • When a TOP recalculation happens, get a list of all processes that had their allocation changed.
  • If these processes have a remote (or local, in case the recalque is happening at a remote one), request their corresponding TOP be recalculated.

That's it. To avoid infinite loops, one must keep track of all recalculations that already took place, so before requesting a TOP to be recalculated, make sure that TOP has not already been recalculated as a result of the modification on that original process.

Keeping track of this state will probably have to be done on a separate GenServer, since the TOP "chain reaction" would probably have the shape of a binary tree, and each branch would not be able to know which TOPs were recalculated on the opposite branch. Unless we use a central state store (GenServer).

renatomassaro avatar Nov 06 '17 00:11 renatomassaro

Note: the "recursive" in the title is half-right. Except for the recalculation of a process' direct servers (local and remote), all other TOPs recalculation happen asynchronously, using our event-driven architecture. So it's more like an "indirect recursion".

If we were using a synchronous recalculation, we could use a recursive function that would be able to accumulate which TOPs were recalculated without having to store it on a central state, but that's not an option since this happens asynchronously, in parallel.

renatomassaro avatar Nov 06 '17 00:11 renatomassaro

Note2: It may seem odd that one small change in a process' allocation could lead to hundreds of TOPs being recalculated, but:

  1. That's sort of inevitable if we want to have an accurate and efficient resource utilization.
  2. That's actually quite fast. Except for the DB IO, recalculating a TOP with ~100 processes takes ~1ms.

Since it's asynchronous, in the case of a TOP recalculation of a massively busy server, with each affected server having hundreds of processes, it would probably take at most a couple minutes to have all of them updated.

Furthermore, there's always room for optimization. Current TOP implementation makes almost no effort on optimizing recalculation, memoizing stuff, etc.

renatomassaro avatar Nov 06 '17 00:11 renatomassaro

#343 is tangentially related to this issue.

We must only emit TOPRecalcadoEvent once BOTH the local and remote (if any) servers have gone through the recalque step, otherwise the process may lack the required information.

(That's because the Allocator only reserves the resources, the actual usage is infered by the amount reserved and limited by the local and remote server. If only the local server went through the allocation step, and the remote is undergoing the allocation on some other thread, the process won't have the full context and may yield a wrong estimation. That's only on the frontend/estimate cache though, but still a bug nonetheless).

Implications:

  • When performing the recalque of a server due to a process change, this recalque must synchronously wait for both the local and remote servers, and only them emit the TOPRecalcadoEvent.

(Updating the process on the DB may be async as long as the process returned by the event has accumulate both the local and remote allocations).

Also related is the fact that a server's "next-to-be-completed" process (TOPBringMeToLifeEvent) is emitted during the allocation. Since the allocation is partial (and the actual duration may differ based on another server's allocation), this is the wrong place to do such thing!! For now we've got a hack there, but implementing the recursive TOP allocation proposed here would fix this.

renatomassaro avatar Nov 26 '17 03:11 renatomassaro