core Use partition/rpartition instead of split/rsplit

Use partition/rpartition instead of split/rsplit

Open akx opened this issue 1 year ago • 5 comments

Proposed change

I spotted a couple of places (and then a couple more . . .) where uses of .split and .partition could be replaced with the more efficient but not particularly well-known partition function (and same for rsplit/rpartition).

partition/rpartition always returns a 3-tuple so checks for how many pieces are returned aren't necessary anymore. This simplifies a bunch of code (a bit).
It's also approximately 20% faster than split in a simple case (and yes, it's a microbenchmark but every little counts in my books); see timeit comparison below
constructing tuples is, to my best knowledge, more memory-efficient than lists, so a modest memory usage improvement could also be in the cards.

I understand that this is a bit on the big side as far as the number of touched files/components goes, but it should nevertheless be easy to review:

The changes are, in most cases, a single line per file. (Looks like opening this triggered quite some codeowners for review...)
- In some places, I've pulled a value that's being .split() twice (or similar) into a local variable, for performance and readability.
The changes are split into three commits based on the changed idioms.

If you like, I have no problem splitting this into smaller PRs either (though I'd like some guidance on what the scope of each PR should then be).

Benchmarking getting the first or last bit of a delimited string

$ python timeit-compare.py -n 10000000 -s 's = "a.b"' 's.partition(".")[0]' 's.partition(".")[-1]' 's.split(".", 1)[0]' 's.split(".", 1)[-1]' 's.rpartition(".r")[0]' 's.rpartition(".")[-1]' 's.rsplit(".", 1)[0]' 's.rsplit(".", 1)[-1]'
s.partition(".")[0]   : 10000000 loop(s), best of 5: 0.079 usec per loop
s.partition(".")[-1]  : 10000000 loop(s), best of 5: 0.081 usec per loop
s.split(".", 1)[0]    : 10000000 loop(s), best of 5: 0.094 usec per loop
s.split(".", 1)[-1]   : 10000000 loop(s), best of 5: 0.092 usec per loop
s.rpartition(".r")[0] : 10000000 loop(s), best of 5: 0.071 usec per loop
s.rpartition(".")[-1] : 10000000 loop(s), best of 5: 0.084 usec per loop
s.rsplit(".", 1)[0]   : 10000000 loop(s), best of 5: 0.091 usec per loop
s.rsplit(".", 1)[-1]  : 10000000 loop(s), best of 5: 0.095 usec per loop

Type of change

[ ] Dependency upgrade
[ ] Bugfix (non-breaking change which fixes an issue)
[ ] New integration (thank you!)
[ ] New feature (which adds functionality to an existing integration)
[ ] Deprecation (breaking change to happen in the future)
[ ] Breaking change (fix/feature causing existing functionality to break)
[x] Code quality improvements to existing code or addition of tests

Checklist

[ ] The code change is tested and works locally.
[ ] Local tests pass. Your PR cannot be merged unless tests pass
[x] There is no commented out code in this PR.
[x] I have followed the development checklist
[x] The code has been formatted using Black (black --fast homeassistant tests)
[ ] Tests have been added to verify that the new code works.
- No new tests; behavior should not have changed, and current tests should cover all of this (hopefully...)

To help with the load of incoming pull requests:

[x] I have reviewed two other open pull requests in this repository.

Nov 03 '22 17:11 akx

While I'm not usually one to turn down performance improvements, this PR changes quite files that don't have test coverage in places that are not called frequently. We don't want the risk of a regression from untested code without a tangible performance improvement in a hot codepath.

We could accept a change to areas you have identified as called frequently, but based on a quick profiler run that doesn't seem to be any of the ones in this PR.

split

Nov 03 '22 17:11 bdraco

@bdraco Thanks for the comments!

While I'm not usually one to turn down performance improvements, this PR changes quite files that don't have test coverage in places that are not called frequently.

Fair. How can you tell which files changed here are or are not covered, as Codecov is having a tough time with this repo? (I.e. is there a way I could tell too?)

We don't want the risk of a regression from untested code without a tangible performance improvement in a hot codepath.

I understand, but on the other hand if the same .split(...)[...] pattern works in one place, why wouldn't it work elsewhere?

based on a quick profiler run that doesn't seem to be any of the ones in this PR.

Can you share the methodology for that?

Nov 03 '22 18:11 akx

@bdraco Thanks for the comments!

While I'm not usually one to turn down performance improvements, this PR changes quite files that don't have test coverage in places that are not called frequently.

Fair. How can you tell which files changed here are or are not covered, as Codecov is having a tough time with this repo? (I.e. is there a way I could tell too?)

You can manually check a each integration with: pytest --cov=homeassistant/components/wiz/ --cov-report term-missing -- tests/components/wiz/

We don't want the risk of a regression from untested code without a tangible performance improvement in a hot codepath.

I understand, but on the other hand if the same .split(...)[...] pattern works in one place, why wouldn't it work elsewhere?

The usage is not the exact same everywhere and there is always a risk of regression with making any change. If the code isn't tested with test coverage it needs to be tested manually before we can accept changes.

based on a quick profiler run that doesn't seem to be any of the ones in this PR.

Can you share the methodology for that?

Run the profiler integration for 60s https://www.home-assistant.io/integrations/profiler/#service-profilerstart

Its going to generate a cProfile and convert it to a callgrind.out.xxx file for you which can be opened with qcachegrind

Nov 03 '22 18:11 bdraco

@bdraco

Thanks for the pointers, I appreciate it.

I rebased this PR and removed changes to files that match any .coveragerc omit stanza (see full list at https://gist.github.com/akx/ffa23728c32d76f800b7a8218216bc08). That should leave only changes to files that should be covered by existing tests.

Nov 03 '22 19:11 akx

@bdraco

Thanks for the pointers, I appreciate it.

I rebased this PR and removed changes to files that match any .coveragerc omit stanza (see full list at gist.github.com/akx/ffa23728c32d76f800b7a8218216bc08). That should leave only changes to files that should be covered by existing tests.

Thats is an improvement, but unfortunately just because its not in .coveragerc doesn't mean the line is covered as the code doesn't have 100% coverage. Additionally you've got at least 25 reviewers that this change affects who potentially need to review this PR so its better to break up a change like this into smaller PRs, instead I'd urge you to focus on the the changes that will have the highest impact.

Sometimes that means only changing very few lines that are executed the most frequently as that change can have more impact and doesn't cost a whole lot of reviewer time.

Nov 03 '22 20:11 bdraco

Before you spend any more time on this, I want to point out this PR is not likely to get merged in its current form.

You'll have much better luck with smaller PRs targeting specific integrations so the codeowners of that integration have a chance to review it instead of everyone at once. (please don't go open 100s of PRs for this)

https://developers.home-assistant.io/docs/creating_component_code_review#5-make-your-pull-request-as-small-as-possible

Nov 07 '22 18:11 bdraco

The change is a good one, but like bdraco said, let's do smaller PRs.

Nov 27 '22 20:11 balloob

core core copied to clipboard

Use partition/rpartition instead of split/rsplit

Proposed change

Benchmarking getting the first or last bit of a delimited string

Type of change

Checklist

core
core copied to clipboard