Why is machine loading rounded?
https://github.com/NixOS/hydra/blob/master/src/hydra-queue-runner/dispatcher.cc#L131-L132
In the comparison for machine selection:
float ta = std::round(a.currentJobs / a.machine->speedFactor);
float tb = std::round(b.currentJobs / b.machine->speedFactor);
This doesn't seem necessary, and leads to incorrect results when the machines have a high speedFactor.
@domenkozar might have figured it out:
14:48 <domenkozar> jophish: yes, one side effect of the roundf is that same machine will be reused until speedfactor is reached
14:48 <domenkozar> so it might have a consequence of less S3 downloads
Yes, IIRC that was the reason.
(Looks like the C++ rewrite of build-remote got rid of the rounding BTW, which may be by accident.)
Would it be more appropriate to divide with max jobs?
I've patched our local hydra to be a bit fairer in case anyone's interested :)
https://github.com/expipiplus1/hydra/commit/73e835b2aeea563994df8c4853c361752105f109
I have simply removed the std::round on my local Hydra and it has been working great for me for a couple of months now. The workload is distributed much better now.
That said, my build machines don't do many S3 downloads as I compile everything from source, so I don't have to worry about that.
It is back to being floats!
Would it be more appropriate to divide with max jobs?
I've patched our local hydra to be a bit fairer in case anyone's interested :)
This is interesting and perhaps should be pursued anyways.
It is back to being floats!
I don't understand. Wasn't this issue about the rounding, which causes the load to be distributed unevenly across the remote machines?
Based on the current code:
https://github.com/NixOS/hydra/blob/b3e0d9a8b78d55e5fea394839524f5a24d694230/src/hydra-queue-runner/dispatcher.cc#L234-L235
... isn't rounding still being performed, or what am I missing?