js-self-profiling icon indicating copy to clipboard operation
js-self-profiling copied to clipboard

Fair sampling with very low sample rates

Open sophiebits opened this issue 7 years ago • 5 comments

It may be desirable to run the sampling profiler with a very low sample rate. Instead of sampling every 1ms on 0.1% of users, more representative results could be achieved with one sample every 1s on 100% of users. (Facebook samples stacks from each machine in the server fleet once every 10 seconds to derive a representative dataset.)

However, large sample rates are likely to give a biased sample if not implemented “fairly”. For example, if profiling begins at pageload with a 1s sample rate, you would end up never capturing a sample of any initialization that happens within the first second, leading to misleading, skewed results. This can happen even with lower sample rates, especially if you have interval- or frame-based processes at a regular interval.

However, it’s possible to create a fair sample even with an extremely large sample rate by using an exponential probability distribution: https://en.wikipedia.org/wiki/Exponential_distribution. Each time you take a sample, instead of setting a timer for interval again, you generate a random variate (interval * -ln(Math.random()), I believe) and set a timer for that long. After n * interval has passed, you have still collected on average n samples, but they will have been sampled more fairly instead of at a regular, predictable, (and therefore biased) interval.

Can we consider adding an element to the spec that mandates that browsers use this method, either by default or as opt-in?

sophiebits avatar Jun 06 '18 18:06 sophiebits

Do we want to request support for this specific usecase or do we want a more general solution for custom profiling intervals? This requirement could also be addressed by allowing the callers of startProfiling to provide a function that generates sampling intervals, or by allowing startProfiling (or some other API) to return a single stack when called from a helper thread (leaving timing up to the application, for better or worse).

vdjeric avatar Jun 06 '18 22:06 vdjeric

I’m not aware of any cases where the exponential distribution is subpar (including as compared to a constant interval), so I’d advocate for it as the default (likely only) option. Even in a case where you intentionally want to bias your samples (say, if you want higher sampling for the first 10 seconds) I believe you would want to apply the exponential distribution on top of it to avoid the same problems I outlined above, and it would be appropriate for the API itself to do so.

Making this fully flexible strikes me as overcustomizable, but I’d be interested to know if other people have use cases for it. Your second suggestion strikes me as more in the spirit of extensible web so it may be worth adding that (perhaps in conjunction with the higher-level API) on principle alone.

sophiebits avatar Jun 06 '18 22:06 sophiebits

I think you could solve this using the API in the explainer. You could just run the profiler every 1ms, take the results and resample the output data. This could even give you better results since the profiler will be 'hot'. Or you can also start profiling instances with the next interval:

function takeNextExponentialSample() {
  const nextSample = interval * -ln(Math.random());
  const instance = performance.startProfiling({interval: nextSample});
  setTimeout(() => {
    const result = await Profiler.stopProfiling();
    result.sample[0]; // Ignore extra samples
    takeNextExponentialSample(); // Goto start
  }, nextSample + gracePeriod);
}

You actually want to avoid having the browser do partial de-opt to enable profiling (generate unwind tables, keep a frame pointer etc...). You might want to keep a global profiler instance running at say 100 seconds as a hint for the browser that this tab is profiling and to leave the JS as profilable:

// Keep the JS code in a profiling partial de-opt state
const instance = performance.startProfiling({interval: nextSample});

For low sampling intervals running an exponential distribution is harmful. If your profiling can only consistently keep up profiling at 1ms and you're exponential distribution rolls a 200 microsecond interval you're not going to get what you expect since its outside the precision capability of the profiler (a lot of which is OS dependent w.r.t. high performance interrupt/setitimer and real time behaviors).

For high sampling intervals the resampling and cascading profilers might be sufficient.

bgirard avatar Jun 06 '18 22:06 bgirard

Or you can also start profiling instances with the next interval

That's fair but looks inefficient and isn't close to a "pit of success". In my mind the point of high-level APIs is to do this sort of thinking so individual users don't have to.

You're not going to get what you expect since it's outside the precision capability of the profiler

Good point, thanks. We could automatically transition between them depending on the specified interval, or perhaps somehow compensate for the minimum time by adjusting future rolls downward.

sophiebits avatar Jun 07 '18 18:06 sophiebits

For example, if profiling begins at pageload with a 1s sample rate, you would end up never capturing a sample of any initialization that happens within the first second, leading to misleading, skewed results.

I believe this problem is worth highlighting, and may be worthwhile to address by itself even if the exponential distribution doesn't (yet) make it in. Without this, the resulting visualisations would likely be surprising and suffer from sever gaps in coverage

Staggering the first interval might suffice to avoid most systemic blind spots that would otherwise be created (assuming aggregation over many clients). Both to ensure coverage during the first interval, as well as after that since the second interval onward would be offset the same way.

As an example, Wikimedia's php-excimer also works this way and we didn't feel a need to make this configurable. We don't (yet) fuzz sample intervals on an on-going basis as we found the staggered start to be sufficient.

To clarify, I don't oppose exponential distribution, rather I'd like to explicitly call out additional support for at least ensuring a staggered starting position.

Krinkle avatar Sep 30 '21 18:09 Krinkle