core icon indicating copy to clipboard operation
core copied to clipboard

Modbus over TCP causes memory leak

Open zamojski opened this issue 1 year ago • 11 comments

The problem

Modbus integration causes memory leak. I have 14 energy meters connected via Modbus TCP (65 entities in total) and everyday I experience crash of HA production installation due to lack of memory and swap. The same occurs on clean Home Assistant Container installation from which I attach the logs. There's nothing added in this clean installation except for System Monitor and Modbus integrations. I attach the screenshot of memory usage from both HA installations. Both installations use latest HA versions and the bug doesn't relate to particular version.

image image image

What version of Home Assistant Core has the issue?

core-2024.9.3

What was the last working version of Home Assistant Core?

No response

What type of installation are you running?

Home Assistant Container

Integration causing the issue

Modbus

Link to integration documentation on our website

No response

Diagnostics information

home-assistant_modbus_2024-10-02T09-08-31.340Z.log

Example YAML snippet

modbus:
  - name: "Miernik Energii Rozdzielnica"
    type: tcp
    host: "192.168.1.252"
    port: 4196
    sensors:
      # Pompa CO Góra
      - unique_id: pompa_co_gora_napiecie
        name: Pompa CO Góra Napięcie
        device_address: 1
        address: 0
        scan_interval: 2
        input_type: input
        data_type: float32
        device_class: voltage
        precision: 1
        state_class: measurement
        unit_of_measurement: V
      - unique_id: pompa_co_gora_natezenie_pradu
        name: Pompa CO Góra Natężenie prądu
        device_address: 1
        address: 8
        scan_interval: 2
        input_type: input
        data_type: float32
        device_class: current
        precision: 2
        state_class: measurement
        unit_of_measurement: A
      - unique_id: pompa_co_gora_moc
        name: Pompa CO Góra Moc
        device_address: 1
        address: 18
        scan_interval: 2
        input_type: input
        data_type: float32
        device_class: power
        precision: 1
        state_class: measurement
        unit_of_measurement: W
      - unique_id: pompa_co_gora_power_factor
        name: Pompa CO Góra Power factor
        device_address: 1
        address: 42
        scan_interval: 2
        input_type: input
        data_type: float32
        device_class: power_factor
        precision: 2
        state_class: measurement
      - unique_id: pompa_co_gora_suma_dostarczonej_energii
        name: Pompa CO Góra Suma dostarczonej energii
        device_address: 1
        address: 256
        scan_interval: 2
        input_type: input
        data_type: float32
        device_class: energy
        precision: 2
        state_class: total_increasing
        unit_of_measurement: kWh

Anything in the logs that might be useful for us?

No response

Additional information

I'm using Waveshare 4-CH RS485 TO POE ETH (B) gateway to convert Modbus RTU to Modbus TCP.

zamojski avatar Oct 02 '24 09:10 zamojski

Hey there @bdraco, mind taking a look at this issue as it has been labeled with an integration (profiler) you are listed as a code owner for? Thanks!

Code owner commands

Code owners of profiler can trigger bot actions by commenting:

  • @home-assistant close Closes the issue.
  • @home-assistant rename Awesome new title Renames the issue.
  • @home-assistant reopen Reopen the issue.
  • @home-assistant unassign profiler Removes the current integration label and assignees on the issue, add the integration domain after the command.
  • @home-assistant add-label needs-more-information Add a label (needs-more-information, problem in dependency, problem in custom component) to the issue.
  • @home-assistant remove-label needs-more-information Remove a label (needs-more-information, problem in dependency, problem in custom component) on the issue.

(message by CodeOwnersMention)


profiler documentation profiler source (message by IssueLinks)

home-assistant[bot] avatar Oct 02 '24 09:10 home-assistant[bot]

joostlek avatar Oct 02 '24 09:10 joostlek

As such, the best course of action is to try to reproduce it in pymodbus or to just understand what pymodbus is doing.

I cannot do this as I don't know python...

Regarding evidence of it being a memory leak, you should display units in your graphs.

The screenshots above show memory usage in percent.

zamojski avatar Oct 02 '24 13:10 zamojski

@joostlek I wasn't able to run py-spy top --pid 67 due to error Error: Unsupported version of Python: 3.12.0.

The callgrind files are attached attached but start_log_objects was interrupted by container crash due to lack of memory. I will try tomorrow. profile.zip

zamojski avatar Oct 02 '24 19:10 zamojski

Attaching logs from profiler.start_log_objects. error_log.zip

zamojski avatar Oct 03 '24 14:10 zamojski

@joostlek @bdraco Can you please take a look at the attachments? I don't know if they're even correct and useful.

zamojski avatar Oct 04 '24 11:10 zamojski

Anyone can help?

zamojski avatar Oct 08 '24 09:10 zamojski

At least to check if the issue is on HA Core or pymodbus?

zamojski avatar Oct 11 '24 13:10 zamojski

There hasn't been any activity on this issue recently. Due to the high number of incoming GitHub notifications, we have to clean some of the old issues, as many of them have already been resolved with the latest updates. Please make sure to update to the latest Home Assistant version and check if that solves the issue. Let us know if that works for you by adding a comment 👍 This issue has now been marked as stale and will be closed if no further activity occurs. Thank you for your contributions.

Still an issue.

zamojski avatar Oct 25 '24 17:10 zamojski

I didn't analyze the callgrind dump, but into the profile log I see a lot of warnings about scan_interval is lower than 5 seconds, which may cause Home Assistant stability issues. I don't know if there is a connection, but at your side I wouldn't ignore those warnings.

crug80 avatar Oct 29 '24 10:10 crug80

Meanwhile I have changed it to 5 sec hoping it will help somehow but it didn't - out of memory crashes now happen less frequently...

zamojski avatar Oct 29 '24 10:10 zamojski

Meanwhile I have changed it to 5 sec hoping it will help somehow but it didn't - out of memory crashes now happen less frequently...

Why did you state "it"?! You have more than one sensor having that problem. You should change them all.

crug80 avatar Oct 29 '24 10:10 crug80

I did it for all sensors, no worries 🙂 "It" stands for scan_interval.

zamojski avatar Oct 29 '24 11:10 zamojski

Just another idea.. Can you post the modbus spec of the device you are integrating? I want to check some info to verify my suspect.

crug80 avatar Oct 29 '24 11:10 crug80

In real world I use 1x Earu EA777 (3 phase) and 11x DDS661 (1 phase) energy meters. I'm reading values from all these sensors chained using 4-CH_RS485_TO_POE_ETH_(B) POE version.

The example in first post comes from DDS661. Here's a sensor config for EA777:

      # Płyta indukcyjna
        # L1
      - unique_id: plyta_indukcyjna_1_napiecie
        name: Płyta indukcyjna 1 Napięcie
        device_address: 15
        address: 0
        scan_interval: 5
        input_type: input
        data_type: int16
        device_class: voltage
        precision: 1
        scale: 0.1
        state_class: measurement
        unit_of_measurement: V
      - unique_id: plyta_indukcyjna_1_natezenie_pradu
        name: Płyta indukcyjna 1 Natężenie prądu
        device_address: 15
        address: 3
        scan_interval: 5
        input_type: input
        data_type: int16
        device_class: current
        precision: 2
        scale: 0.01
        state_class: measurement
        unit_of_measurement: A
      - unique_id: plyta_indukcyjna_1_moc
        name: Płyta indukcyjna 1 Moc
        device_address: 15
        address: 8
        scan_interval: 5
        input_type: input
        data_type: int16
        device_class: power
        precision: 1
        state_class: measurement
        unit_of_measurement: W
      - unique_id: plyta_indukcyjna_1_power_factor
        name: Płyta indukcyjna 1 Power factor
        device_address: 15
        address: 20
        scan_interval: 5
        input_type: input
        data_type: int16
        device_class: power_factor
        precision: 2
        scale: 0.01
        state_class: measurement
        # L2
      - unique_id: plyta_indukcyjna_2_napiecie
        name: Płyta indukcyjna 2 Napięcie
        device_address: 15
        address: 1
        scan_interval: 5
        input_type: input
        data_type: int16
        device_class: voltage
        precision: 1
        scale: 0.1
        state_class: measurement
        unit_of_measurement: V
      - unique_id: plyta_indukcyjna_2_natezenie_pradu
        name: Płyta indukcyjna 2 Natężenie prądu
        device_address: 15
        address: 4
        scan_interval: 5
        input_type: input
        data_type: int16
        device_class: current
        precision: 2
        scale: 0.01
        state_class: measurement
        unit_of_measurement: A
      - unique_id: plyta_indukcyjna_2_moc
        name: Płyta indukcyjna 2 Moc
        device_address: 15
        address: 9
        scan_interval: 5
        input_type: input
        data_type: int16
        device_class: power
        precision: 1
        state_class: measurement
        unit_of_measurement: W
      - unique_id: plyta_indukcyjna_2_power_factor
        name: Płyta indukcyjna 2 Power factor
        device_address: 15
        address: 21
        scan_interval: 5
        input_type: input
        data_type: int16
        device_class: power_factor
        precision: 2
        scale: 0.01
        state_class: measurement
        # Total
      - unique_id: plyta_indukcyjna_suma_dostarczonej_energii
        name: Płyta indukcyjna Suma dostarczonej energii
        device_address: 15
        address: 29
        scan_interval: 5
        input_type: input
        data_type: int32
        device_class: energy
        precision: 2
        scale: 0.01
        state_class: total_increasing
        unit_of_measurement: kWh

DDS661 English User's Manual.pdf EA777 English User's Manual.pdf

zamojski avatar Oct 29 '24 11:10 zamojski

Unfortunately the yaml matches the spec and my suspect was wrong. The only further suggestion is to try break down the problem in smaller pieces, in example focusing on sensors coming from one device at a time and avoid to include sensors missing the answer as in example device 1.

crug80 avatar Oct 29 '24 13:10 crug80

There hasn't been any activity on this issue recently. Due to the high number of incoming GitHub notifications, we have to clean some of the old issues, as many of them have already been resolved with the latest updates. Please make sure to update to the latest Home Assistant version and check if that solves the issue. Let us know if that works for you by adding a comment 👍 This issue has now been marked as stale and will be closed if no further activity occurs. Thank you for your contributions.

The issue still persists

zamojski avatar Nov 12 '24 21:11 zamojski

I've tried to use a single energy monitor device type at the time but I didn't observe any difference in memory usage.

zamojski avatar Nov 15 '24 08:11 zamojski

There hasn't been any activity on this issue recently. Due to the high number of incoming GitHub notifications, we have to clean some of the old issues, as many of them have already been resolved with the latest updates. Please make sure to update to the latest Home Assistant version and check if that solves the issue. Let us know if that works for you by adding a comment 👍 This issue has now been marked as stale and will be closed if no further activity occurs. Thank you for your contributions.

Still exists

zamojski avatar Nov 29 '24 11:11 zamojski

I just updated to 2024.11.3 and I have a problem very similar to this. It started giving me problems starting with version 2024.11.2.

jouking avatar Dec 01 '24 12:12 jouking

It seems nobody cares about one of major smart home communication protocols...

zamojski avatar Dec 19 '24 12:12 zamojski

There hasn't been any activity on this issue recently. Due to the high number of incoming GitHub notifications, we have to clean some of the old issues, as many of them have already been resolved with the latest updates. Please make sure to update to the latest Home Assistant version and check if that solves the issue. Let us know if that works for you by adding a comment 👍 This issue has now been marked as stale and will be closed if no further activity occurs. Thank you for your contributions.

Still an issue

zamojski avatar Jan 02 '25 14:01 zamojski

Still an issue

The new HA release 2025.1 includes an updated version of the pymodbus library. As possible, can you check if the leaks are still present?

crug80 avatar Jan 03 '25 22:01 crug80

On 2025.1 I didn't notice any improvement, memory still leaks as it was.

zamojski avatar Jan 06 '25 19:01 zamojski

There hasn't been any activity on this issue recently. Due to the high number of incoming GitHub notifications, we have to clean some of the old issues, as many of them have already been resolved with the latest updates. Please make sure to update to the latest Home Assistant version and check if that solves the issue. Let us know if that works for you by adding a comment 👍 This issue has now been marked as stale and will be closed if no further activity occurs. Thank you for your contributions.