pyperformance icon indicating copy to clipboard operation
pyperformance copied to clipboard

Add more pathlib benchmarks

Open zmievsa opened this issue 2 years ago • 4 comments

A small extension on current benchmarks related to pathlib.

My current implementation is very barebones and dumb because I am quite new to writing pyperformance benchmarks and benchmarks in general. Any and all opinions on how to improve my work are welcome!

zmievsa avatar Jan 15 '23 17:01 zmievsa

I have no comment one way or the other about these changes, but we should rename the benchmark to pathlib2 if we do this so that comparisons won't be misleading.

mdboom avatar Feb 07 '23 15:02 mdboom

@mdboom could you help me understand what you mean by "so that comparisons won't be misleading"?

zmievsa avatar Feb 09 '23 04:02 zmievsa

@mdboom could you help me understand what you mean by "so that comparisons won't be misleading"?

When a benchmark is changed significantly like this, previously run baseline benchmarks are no longer meaningful to compare against, and could lead someone to make the wrong decision based on the comparison.

mdboom avatar Feb 09 '23 15:02 mdboom

Sorry for taking so long to look at this.

Thanks for putting this together. I'm already using it locally, with some modifications :)

IIUC, pyperformance isn't designed for function-by-function benchmarking. I think we might want to merge some of these test cases. Overall, I think I'm aiming for:

  • pathlib: previous code for this benchmark per @mdboom's comment
  • pathlib_construct: cover PurePath(), Path(), joinpath(), fspath(path)
  • pathlib_normalize: cover drive, root, anchor, parts, name, suffix, suffixes, stem, with_name(), with_stem(), with_suffix(), relative_to(), is_relative_to(), parent, parents, is_reserved(), match()
  • pathlib_string: cover str(path), as_posix(), bytes(path), path.as_uri(), repr(path).
  • pathlib_compare: cover a == b, hash(a), a < b
  • pathlib_fs: cover absolute(), stat(), open(), touch(), mkdir(), unlink(), rmdir(), exists(), is_dir(), is_file(),
  • pathlib_fs_walk: cover iterdir(), glob(), rglob(), walk()

(pyperformance folks, please correct me if I'm doing this wrong)

This PR could add pathlib_construct and pathlib_normalize.

We need more realistic test cases for path construction. We need a variety of paths: POSIX and Windows, absolute and relative, short and (some) long. I'd imagine that the average path length falls off quite rapidly, so most of our paths should be <5 components long, with very few >10. It would be good to generate some realistic-looking file extensions too (the benchmark already does something like this, but for concrete paths).

barneygale avatar Feb 10 '23 18:02 barneygale