Bias max_offset in pareto distribution

Open morgo opened this issue 1 year ago • 0 comments

We use sysbench to test Spirit (an online schema change tool).

Spirit has an optimization where it can ignore keys that have been modified above a certain known point. Unfortunately this is a little bit difficult to test in sysbench because while many of our workloads look to be roughly pareto, it is the higher keys that are modified, and not the first keys in the table.

What I would like to propose is an option to the pareto distribution function. i.e. currently defined as:

uint32_t sb_rand_pareto(uint32_t a, uint32_t b) # starting-value, max-offset
{
  return a + (uint32_t) ((b - a + 1) *
                         pow(sb_rand_uniform_double(), pareto_power));
}

For our use-case we would instead like to have something like:

uint32_t sb_rand_pareto(uint32_t a, uint32_t b) # starting-value, max-offset
{
  return b - (uint32_t) ((b - a + 1) *
                         pow(sb_rand_uniform_double(), pareto_power));
}

Aug 07 '24 17:08 morgo