core
core copied to clipboard
Add option to disable keep-alive for Enphase Envoy connections
Proposed change
Add configuration option to Enphase Envoy integration to allow disabling keep-alive on the httpx connections to the Envoy. The option is disabled by default, user can enable it
Envoy firmware 3.18.10 observations show that obtaining data from the endpoint /api/v1/production/inverters is prone to failures with connection pool enabled. Until now this is the only firmware reporting this.
Observations from network packet traces:
- The communication is using httpx 0.27.0 with digest authentication.
- When a get from the endpoint is requested, the Envoy returns a 401 with the digest information included.
- HTTPX then repeats the get request on the same connection with the digest now included.
- The Envoy replies with TCP FIN,ACK to the message and closes the connection using RST.
- HTTPX signals a RemoteProtocolError.
- HTTPX repeats the get, new connection is build, but Digest is not included.
- Same sequence repeats up-to 4 times when tenacity limit is reached.
Test shows that setting the client to max_keepalive_connections=0 solves the issue.
Test with a recent ENVOY standard non-metered firmware 8.2.4264 using the disable keep-alive show no issues.
It is not clear if this is an httpx issue or not, the question is raised. As the firmware is pretty old fixes to httpx may not apply or not occur. It will take time for sure, for now adding the option will solve the issue for the HA user on short notice.
Type of change
- [ ] Dependency upgrade
- [x] Bugfix (non-breaking change which fixes an issue)
- [ ] New integration (thank you!)
- [ ] New feature (which adds functionality to an existing integration)
- [ ] Deprecation (breaking change to happen in the future)
- [ ] Breaking change (fix/feature causing existing functionality to break)
- [ ] Code quality improvements to existing code or addition of tests
Additional information
- This PR fixes or closes issue: fixes #126162
- This PR is related to issue:
- Link to documentation pull request: https://github.com/home-assistant/home-assistant.io/pull/35083
Checklist
- [x] The code change is tested and works locally.
- [x] Local tests pass. Your PR cannot be merged unless tests pass
- [x] There is no commented out code in this PR.
- [x] I have followed the development checklist
- [x] I have followed the perfect PR recommendations
- [x] The code has been formatted using Ruff (
ruff format homeassistant tests) - [x] Tests have been added to verify that the new code works.
If user exposed functionality or configuration variables are added/changed:
- [ ] Documentation added/updated for www.home-assistant.io
If the code communicates with devices, web services, or third-party tools:
- [ ] The manifest file has all fields filled out correctly.
Updated and included derived files by running:python3 -m script.hassfest. - [ ] New or updated dependencies have been added to
requirements_all.txt.
Updated by runningpython3 -m script.gen_requirements_all. - [ ] For the updated dependencies - a link to the changelog, or at minimum a diff between library versions is added to the PR description.
To help with the load of incoming pull requests:
- [ ] I have reviewed two other open pull requests in this repository.
Hey there @bdraco, @cgarwood, @joostlek, mind taking a look at this pull request as it has been labeled with an integration (enphase_envoy) you are listed as a code owner for? Thanks!
Code owner commands
Code owners of enphase_envoy can trigger bot actions by commenting:
@home-assistant closeCloses the pull request.@home-assistant rename Awesome new titleRenames the pull request.@home-assistant reopenReopen the pull request.@home-assistant unassign enphase_envoyRemoves the current integration label and assignees on the pull request, add the integration domain after the command.@home-assistant add-label needs-more-informationAdd a label (needs-more-information, problem in dependency, problem in custom component) to the pull request.@home-assistant remove-label needs-more-informationRemove a label (needs-more-information, problem in dependency, problem in custom component) on the pull request.
Please take a look at the requested changes, and use the Ready for review button when you are done, thanks :+1:
I wonder if we should do this by default given how bad the firmware on these devices can be.
Do we have a minimum known good firmware version or are they all the firmwares newer than 3.x subject to this issue?
Do we have a minimum known good firmware version or are they all the firmwares newer than 3.x subject to this issue?
Not really. This was for 3.18.10. If I recall correctly yours is like 3.9.x and not showing the issue like this report? So it may be mixed. No further clear reports, but of course often reports of not well understood communication issues across many firmware version. This may be an underlying cause for some of those as well.
I wonder if we should do this by default given how bad the firmware on these devices can be.
Didn't want to make that big a change without more run-time. Now we have the option available to recommend trying in other communication issues as well. Based on that experience we may then make it default choice in a future release.