r5
r5 copied to clipboard
Randomize origin order in regional analyses
In regional analyses, we currently handle origins in order by row, starting at the upper left corner. If, as in the Netherlands, the upper left corner is mostly water, the regional analysis will proceed much faster at the beginning. This gives an incorrect impression of total run time for a job.
I've reimplemented the broker to track completed tasks using bitsets. The challenge with randomization of order is that it requires materializing the sequence, which uses more memory. The advantage of more accurate run time prediction is probably not worth the extra complexity of materializing this order list.
I think you could do a random walk over the bitset and just loop over at the ends. So say you'd jump forward a random amount between 0 and the length of the bitset, then get the next unset bit and enqueue that task.
Thinking ahead to when partial results will be displayed, it could be useful to start at the center of a rectangular grid and spiral outward.
Having workers work on blocks of origins adjacent to each other is probably also more cache-efficient as the origins being handled in parallel on the same machine will be using a lot of the same streets and transit routes. The origins could still be distributed randomly to different workers, but the individual blocks of tasks handed to any particular worker could be geographically contiguous.