modbus plugin can get out of sync WRT transaction IDs
Relevant telegraf.conf
[[inputs.modbus]]
name = "pv_inverter"
slave_id = 1
timeout = "5s"
controller = "tcp://192.168.66.3:502"
configuration_type = "metric"
optimization = "max_insert"
[inputs.modbus.workarounds]
pause_after_connect = "2000ms"
[[inputs.modbus.metric]]
slave_id = 1
measurement = "pv_inverter" # Huawei Sun2000
# cf https://github.com/wlcrs/huawei_solar/wiki
fields = [
{ address=32064, name="P_in", type="INT32", scale=1.0 },
{ address=32069, name="L1_U", type="UINT16", scale=0.1 },
{ address=32070, name="L2_U", type="UINT16", scale=0.1 },
{ address=32071, name="L3_U", type="UINT16", scale=0.1 },
{ address=32072, name="L1_I", type="INT32", scale=0.001 },
{ address=32074, name="L2_I", type="INT32", scale=0.001 },
{ address=32076, name="L3_I", type="INT32", scale=0.001 },
{ address=32080, name="P_out", type="INT32", scale=1.0 },
{ address=32082, name="Q_out", type="INT32", scale=1.0 },
{ address=32084, name="Power_Factor", type="INT16", scale=0.001 },
{ address=32085, name="Frequency", type="UINT16", scale=0.01 },
{ address=32086, name="Efficiency", type="UINT16", scale=0.01 },
{ address=32087, name="Temperature", type="INT16", scale=0.1 },
]
[inputs.modbus.metric.tags]
null="maybe"
[[inputs.modbus.metric]]
slave_id = 1
measurement = "pv_inverter" # Huawei Sun2000
# cf https://github.com/wlcrs/huawei_solar/wiki
fields = [
{ address=32000, name="state_1", type="UINT16" },
{ address=32002, name="state_2", type="UINT16" },
{ address=32003, name="state_3", type="UINT16" },
{ address=32008, name="alarm_1", type="UINT16" },
{ address=32009, name="alarm_2", type="UINT16" },
{ address=32010, name="alarm_3", type="UINT16" },
{ address=32016, name="PV1_U", type="INT16", scale=0.1 },
{ address=32017, name="PV1_I", type="INT16", scale=0.01 },
{ address=32018, name="PV2_U", type="INT16", scale=0.1 },
{ address=32019, name="PV2_I", type="INT16", scale=0.01 },
{ address=32078, name="Peak_Power_today", type="INT32", scale=1.0 },
{ address=32088, name="R_iso", type="UINT16", scale=1000.0 },
{ address=32089, name="device_status", type="UINT16" },
{ address=32090, name="fault_code", type="UINT16" },
{ address=32106, name="energy_total", type="UINT32", scale=10.0},
]
Logs from Telegraf
Nov 18 07:02:05 rockpro telegraf[1449366]: 2023-11-18T06:02:05Z E! [inputs.modbus] Error in plugin: slave 1: read tcp 192.168.57.7:56970->192.168.66.3:502: i/o timeout
Nov 18 07:02:10 rockpro telegraf[1449366]: 2023-11-18T06:02:10Z E! [inputs.modbus] Error in plugin: slave 1: modbus: response transaction id '7616' does not match request '7617'
Nov 18 07:02:20 rockpro telegraf[1449366]: 2023-11-18T06:02:20Z E! [inputs.modbus] Error in plugin: slave 1: modbus: response transaction id '7617' does not match request '7618'
Nov 18 07:02:30 rockpro telegraf[1449366]: 2023-11-18T06:02:30Z E! [inputs.modbus] Error in plugin: slave 1: modbus: response transaction id '7618' does not match request '7619'
Nov 18 07:02:40 rockpro telegraf[1449366]: 2023-11-18T06:02:40Z E! [inputs.modbus] Error in plugin: slave 1: modbus: response transaction id '7619' does not match request '7620'
Nov 18 07:02:50 rockpro telegraf[1449366]: 2023-11-18T06:02:50Z E! [inputs.modbus] Error in plugin: slave 1: modbus: response transaction id '7620' does not match request '7621'
[these lines repeat until telegraf is restarted]
System info
telegraf 1.28.5-1, Debian 12.2
Docker
No response
Steps to reproduce
The issue occurs occasionally; telegraf runs fine for weeks on end, but if this error is occuring, it does not recover by itself.
Expected behavior
I guess closing/reopening the Modbus connection after an i/o error, or at least resetting the internal counters for the transaction IDs, would be a solution.
Actual behavior
After this desynchronization the connection never recovers, until telegraf is restarted.
Additional info
The device being monitored is a Huawei Sun2000 solar inverter, whose modbus implementation is a bit flakey.
Not every "i/o timeout" message in the logs leads to this behaviour.
@cmock this seems to be an error in the upstream library. Can you please report the error there and add a link to this issue!?
done: https://github.com/grid-x/modbus/issues/78