pytest-benchmark icon indicating copy to clipboard operation
pytest-benchmark copied to clipboard

Scaling / Accumulating benchmarks / Source iterator

Open rcoup opened this issue 6 years ago • 2 comments

My use case: profiling a transform process against different input data sets (which have different sizes / structures). Goal is to compare per-item performance

So (conceptually):

class Transformer:
  def iter_features(self):
    for row in self.source.query():
      feature = self.transform(row)
      yield feature

Simplistically, I could test performance it this way:

@pytest.parameterize("source_dataset", ["A", "B", "C"])
def test_features_perf(source_dataset, benchmark):
  transformer = Transformer(source_dataset)
  benchmark(transformer.iter_features)

But this returns the overall processing time. If dataset A has 50K rows, and dataset B has 100 rows, then you can't easily compare them. Unless i could provide a scale-factor/divisor (eg. source.num_rows) to normalise the output statistics?

Alternatively, since I really want transforms/second I could benchmark the inner Transformer.transform():

@pytest.parameterize("source_dataset", ["A", "B", "C"])
def test_features_perf(source_dataset, benchmark):
  transformer = Transformer(source_dataset)
  benchmark.weave(transformer.transform, lazy=True)
  for f in transformer.iter_features():
    pass

But then the first row is individually benchmarked for N rounds, and the second row triggers an error because it's already done the benchmark. If each row was measured once, and each row contributed to the total, then I'd get the results I want. (Yes, rounds wouldn't be calibrated to the timer, but that's ok in this scenario).

Any other suggestions/approaches/ideas I'm missing here?

Maybe-related: #52

rcoup avatar Aug 14 '19 11:08 rcoup

Would you like to have some sort of benchmark.pedantic_weave? That would allow you to setup the benchmark to only run one time, but won't really work if you call it multiple times.

Tho I think averaging the results for transformer.transform would be more useful. Eg: number of features = the number of rounds, iterations = 1. This could be implemented but is it general enough?

Having some way to apply scaling would also solve it but it's more complicated to implement. You could also use the existing hooks or edit the json to scale the data.

ionelmc avatar Aug 15 '19 12:08 ionelmc

Tho I think averaging the results for transformer.transform would be more useful. Eg: number of features = the number of rounds, iterations = 1. This could be implemented but is it general enough?

Yeah, this was my original concept — just using whatever cycles my iteration/loop has, then treating that as the number of rounds wrt the rest of the benchmarking.

rcoup avatar Aug 15 '19 14:08 rcoup