WeeWX-MQTTSubscribe
WeeWX-MQTTSubscribe copied to clipboard
MQTT problems cause WeeWX consuming all the CPU
Observations:
- The WeeWX process consumes more than 95 percent of the CPU.
- The syslog is full of complaints of the MQTTSubscribe extension as shown below.
- Restarting WeeWX or the whole computer does not help.
- The computer becomes very slow.
- This always happens when the MQTT broker is overloaded and responses delayed or not in order.
Mar 2 19:01:31 LokalWiki weewx[219308] ERROR user.MQTTSubscribe: (Service) Ignoring packet has dateTime of 1677779978.000000 which is prior to previous packet 1677780068.000000
Mar 2 19:01:31 LokalWiki weewx[219308] ERROR user.MQTTSubscribe: (Service) Ignoring packet has dateTime of 1677779980.000000 which is prior to previous packet 1677780068.000000
Mar 2 19:01:32 LokalWiki weewx[219308] ERROR user.MQTTSubscribe: (Service) Ignoring packet has dateTime of 1677779983.000000 which is prior to previous packet 1677780068.000000
Mar 2 19:01:32 LokalWiki weewx[219308] ERROR user.MQTTSubscribe: (Service) Ignoring packet has dateTime of 1677779985.000000 which is prior to previous packet 1677780068.000000
Mar 2 19:01:32 LokalWiki weewx[219308] ERROR user.MQTTSubscribe: (Service) Ignoring packet has dateTime of 1677779988.000000 which is prior to previous packet 1677780068.000000
Mar 2 19:01:32 LokalWiki weewx[219308] ERROR user.MQTTSubscribe: (Service) Ignoring packet has dateTime of 1677779990.000000 which is prior to previous packet 1677780068.000000
Mar 2 19:01:32 LokalWiki weewx[219308] ERROR user.MQTTSubscribe: (Service) Ignoring packet has dateTime of 1677779993.000000 which is prior to previous packet 1677780068.000000
Mar 2 19:01:32 LokalWiki weewx[219308] ERROR user.MQTTSubscribe: (Service) Ignoring packet has dateTime of 1677779995.000000 which is prior to previous packet 1677780068.000000
Mar 2 19:01:32 LokalWiki weewx[219308] ERROR user.MQTTSubscribe: (Service) Ignoring packet has dateTime of 1677779998.000000 which is prior to previous packet 1677780068.000000
Mar 2 19:01:32 LokalWiki weewx[219308] ERROR user.MQTTSubscribe: (Service) Ignoring packet has dateTime of 1677780000.000000 which is prior to previous packet 1677780068.000000
Mar 2 19:01:32 LokalWiki weewx[219308] ERROR user.MQTTSubscribe: (Service) Ignoring packet has dateTime of 1677780003.000000 which is prior to previous packet 1677780068.000000
Mar 2 19:01:33 LokalWiki weewx[219308] ERROR user.MQTTSubscribe: (Service) Ignoring packet has dateTime of 1677780005.000000 which is prior to previous packet 1677780068.000000
Mar 2 19:01:34 LokalWiki weewx[219308] ERROR user.MQTTSubscribe: (Service) Ignoring packet has dateTime of 1677780008.000000 which is prior to previous packet 1677780068.000000
Mar 2 19:01:36 LokalWiki weewx[219308] ERROR user.MQTTSubscribe: (Service) Ignoring packet has dateTime of 1677780010.000000 which is prior to previous packet 1677780068.000000
Mar 2 19:01:36 LokalWiki weewx[219308] ERROR user.MQTTSubscribe: (Service) Ignoring packet has dateTime of 1677780013.000000 which is prior to previous packet 1677780068.000000
Mar 2 19:01:37 LokalWiki weewx[219308] ERROR user.MQTTSubscribe: (Service) Ignoring packet has dateTime of 1677780015.000000 which is prior to previous packet 1677780068.000000
Mar 2 19:01:38 LokalWiki weewx[219308] ERROR user.MQTTSubscribe: (Service) Ignoring packet has dateTime of 1677780018.000000 which is prior to previous packet 1677780068.000000
Mar 2 19:01:38 LokalWiki weewx[219308] ERROR user.MQTTSubscribe: (Service) Ignoring packet has dateTime of 1677780020.000000 which is prior to previous packet 1677780068.000000
Mar 2 19:01:38 LokalWiki weewx[219308] ERROR user.MQTTSubscribe: (Service) Ignoring packet has dateTime of 1677780023.000000 which is prior to previous packet 1677780068.000000
Mar 2 19:01:39 LokalWiki weewx[219308] ERROR user.MQTTSubscribe: (Service) Ignoring packet has dateTime of 1677780025.000000 which is prior to previous packet 1677780068.000000
Mar 2 19:01:39 LokalWiki weewx[219308] ERROR user.MQTTSubscribe: (Service) Ignoring packet has dateTime of 1677780028.000000 which is prior to previous packet 1677780068.000000
Mar 2 19:01:39 LokalWiki weewx[219308] ERROR user.MQTTSubscribe: (Service) Ignoring packet has dateTime of 1677780030.000000 which is prior to previous packet 1677780068.000000
Mar 2 19:01:39 LokalWiki weewx[219308] ERROR user.MQTTSubscribe: (Service) Ignoring packet has dateTime of 1677780033.000000 which is prior to previous packet 1677780068.000000
Mar 2 19:01:39 LokalWiki weewx[219308] ERROR user.MQTTSubscribe: (Service) Ignoring packet has dateTime of 1677780035.000000 which is prior to previous packet 1677780068.000000
Mar 2 19:01:39 LokalWiki weewx[219308] ERROR user.MQTTSubscribe: (Service) Ignoring packet has dateTime of 1677780038.000000 which is prior to previous packet 1677780068.000000
Mar 2 19:01:39 LokalWiki weewx[219308] ERROR user.MQTTSubscribe: (Service) Ignoring packet has dateTime of 1677780040.000000 which is prior to previous packet 1677780068.000000
Mar 2 19:01:39 LokalWiki weewx[219308] ERROR user.MQTTSubscribe: (Service) Ignoring packet has dateTime of 1677780043.000000 which is prior to previous packet 1677780068.000000
Mar 2 19:01:39 LokalWiki weewx[219308] ERROR user.MQTTSubscribe: (Service) Ignoring packet has dateTime of 1677780045.000000 which is prior to previous packet 1677780068.000000
Mar 2 19:01:39 LokalWiki weewx[219308] ERROR user.MQTTSubscribe: (Service) Ignoring packet has dateTime of 1677780048.000000 which is prior to previous packet 1677780068.000000
Mar 2 19:01:39 LokalWiki weewx[219308] ERROR user.MQTTSubscribe: (Service) Ignoring packet has dateTime of 1677780050.000000 which is prior to previous packet 1677780068.000000
Mar 2 19:01:39 LokalWiki weewx[219308] ERROR user.MQTTSubscribe: (Service) Ignoring packet has dateTime of 1677780053.000000 which is prior to previous packet 1677780068.000000
Mar 2 19:01:39 LokalWiki weewx[219308] ERROR user.MQTTSubscribe: (Service) Ignoring packet has dateTime of 1677780055.000000 which is prior to previous packet 1677780068.000000
Mar 2 19:01:43 LokalWiki weewx[219308] ERROR user.MQTTSubscribe: (Service) Ignoring packet has dateTime of 1677780057.000000 which is prior to previous packet 1677780101.000000
Mar 2 19:01:43 LokalWiki weewx[219308] ERROR user.MQTTSubscribe: (Service) Ignoring packet has dateTime of 1677780059.000000 which is prior to previous packet 1677780101.000000
Mar 2 19:01:43 LokalWiki weewx[219308] ERROR user.MQTTSubscribe: (Service) Ignoring packet has dateTime of 1677780062.000000 which is prior to previous packet 1677780101.000000
Mar 2 19:01:43 LokalWiki weewx[219308] ERROR user.MQTTSubscribe: (Service) Ignoring packet has dateTime of 1677780064.000000 which is prior to previous packet 1677780101.000000
Mar 2 19:01:43 LokalWiki weewx[219308] ERROR user.MQTTSubscribe: (Service) Ignoring packet has dateTime of 1677780067.000000 which is prior to previous packet 1677780101.000000
Mar 2 19:01:43 LokalWiki weewx[219308] ERROR user.MQTTSubscribe: (Service) Ignoring packet has dateTime of 1677780069.000000 which is prior to previous packet 1677780101.000000
Mar 2 19:01:43 LokalWiki weewx[219308] ERROR user.MQTTSubscribe: (Service) Ignoring packet has dateTime of 1677780072.000000 which is prior to previous packet 1677780101.000000
Mar 2 19:01:44 LokalWiki weewx[219308] ERROR user.MQTTSubscribe: (Service) Ignoring packet has dateTime of 1677780074.000000 which is prior to previous packet 1677780101.000000
Mar 2 19:01:44 LokalWiki weewx[219308] ERROR user.MQTTSubscribe: (Service) Ignoring packet has dateTime of 1677780077.000000 which is prior to previous packet 1677780101.000000
Mar 2 19:01:44 LokalWiki weewx[219308] ERROR user.MQTTSubscribe: (Service) Ignoring packet has dateTime of 1677780079.000000 which is prior to previous packet 1677780101.000000
Mar 2 19:02:07 LokalWiki weewx[219308] ERROR user.MQTTSubscribe: (Service) Ignoring packet has dateTime of 1677780082.000000 which is prior to previous packet 1677780117.000000
Mar 2 19:02:08 LokalWiki weewx[219308] ERROR user.MQTTSubscribe: (Service) Ignoring packet has dateTime of 1677780084.000000 which is prior to previous packet 1677780117.000000
Mar 2 19:02:10 LokalWiki weewx[219308] ERROR user.MQTTSubscribe: (Service) Ignoring packet has dateTime of 1677780087.000000 which is prior to previous packet 1677780117.000000
Mar 2 19:02:42 LokalWiki weewx[219308] ERROR user.MQTTSubscribe: (Service) Ignoring packet has dateTime of 1677780089.000000 which is prior to previous packet 1677780153.000000
Mar 2 19:02:43 LokalWiki weewx[219308] ERROR user.MQTTSubscribe: (Service) Ignoring packet has dateTime of 1677780092.000000 which is prior to previous packet 1677780153.000000
Mar 2 19:02:44 LokalWiki weewx[219308] ERROR user.MQTTSubscribe: (Service) Ignoring packet has dateTime of 1677780094.000000 which is prior to previous packet 1677780153.000000
Mar 2 19:02:44 LokalWiki weewx[219308] ERROR user.MQTTSubscribe: (Service) Ignoring packet has dateTime of 1677780097.000000 which is prior to previous packet 1677780153.000000
Mar 2 19:02:44 LokalWiki weewx[219308] ERROR user.MQTTSubscribe: (Service) Ignoring packet has dateTime of 1677780099.000000 which is prior to previous packet 1677780153.000000
Mar 2 19:02:44 LokalWiki weewx[219308] ERROR user.MQTTSubscribe: (Service) Ignoring packet has dateTime of 1677780102.000000 which is prior to previous packet 1677780153.000000
Mar 2 19:02:44 LokalWiki weewx[219308] ERROR user.MQTTSubscribe: (Service) Ignoring packet has dateTime of 1677780104.000000 which is prior to previous packet 1677780153.000000
You could try setting ignore_start_time = True. This will result in MQTTSubscribe processing the incoming data that has a time ‘prior to previous packet’.
Some additional information can found here, https://github.com/bellrichm/WeeWX-MQTTSubscribe/wiki/Date-Time-processing#mqttsubscribeservice-processing
Because WeeWX/MQTTSubscribe id being ‘flooded’ with multiple incoming messages per second, I’m not sure this will solve the performance issue.
Yes, I could set ignore_start_time = True, but that is not what I meant. Indeed, in this case, the base thing to do is to fix the overload situation of the MQTT broker. I did so. So no problem at my side any more.
What I meant is this: The MQTTSubscribe service should be able to handle situations when packets come in irregularly because of external conditions.
Sorry, I’m not following what you are requesting. Setting the various time configuration settings is for the exact situation where the time in the MQTT payload ‘cannot be trusted’.
This issue has been marked stale because it has been waiting for more information for 4 weeks.
The time in the MQTT payload can be trusted. Because of some temporary overload situation the packets did not come in order any more or they came delayed. Such temporary situation should not result in an unrecoverable error condition.
The error message is because the driver is emitting loop packets out of order. What driver are you using?
I agree its not good for WeeWX to consume the CPU, as of now I don’t know of a way for MQTTSubscribe to safe guard this. I’ll keep this open for a bit and attempt to to recreate the condition.
Current plan. If the driver emits X ‘stale’ packets in a row OR Y total ‘stale’ packets, MQTTSubscribe will shut itself down. X and Y will be configurable.
Below I have reformatted the log snippet to make what is happening more apparent. Using the log time of 19:01:39 as an example. In this one second WeeWX has dispatched 13 new_loop_packet events. Each of these events has been raised because the driver has generated a loop packet. Each of these loop packet have a time difference of 2 to 3 seconds. The times are all in the past. That is a lot of CPU cycles being used outside of MQTTSubscribe. Because the loop packet is ‘in the past’, the main thread of MQTTSubscribe logs it and ignores it. This would add some additional CPU processing, but minimal. There are no errors in the snippet with errors dealing with the incoming MQTT messages. So it appears that this thread is processing as normal (yes this a guess based on the information available). And unless the broker is flooding the topic with messages, it would not be consuming an unusual amount of CPU.
I realized that I can’t gracefully shutdown MQTTSubscribe. All that MQTTSubscribe can do is raise an exception which will shutdown WeeWX. I’ll think a bit more, but right now I don’t think there is much MQTTSubscribe can do.
Log time Packet time Previous packet time
Mar 2 19:01:31 1677779978 18:59:38 1677780068 19:01:08
Mar 2 19:01:31 1677779980 18:59:40 1677780068
Mar 2 19:01:32 1677779983 18:59:43 1677780068
Mar 2 19:01:32 1677779985 18:59:45 1677780068
Mar 2 19:01:32 1677779988 18:59:48 1677780068
Mar 2 19:01:32 1677779990 18:59:50 1677780068
Mar 2 19:01:32 1677779993 18:59:53 1677780068
Mar 2 19:01:32 1677779995 18:59:55 1677780068
Mar 2 19:01:32 1677779998 18:59:58 1677780068
Mar 2 19:01:32 1677780000 19:00:00 1677780068
Mar 2 19:01:32 1677780003 19:00:03 1677780068
Mar 2 19:01:33 1677780005 19:00:05 1677780068
Mar 2 19:01:34 1677780008 19:00:08 1677780068
Mar 2 19:01:36 1677780010 19:00:10 1677780068
Mar 2 19:01:36 1677780013 19:00:13 1677780068
Mar 2 19:01:37 1677780015 19:00:15 1677780068
Mar 2 19:01:38 1677780018 19:00:18 1677780068
Mar 2 19:01:38 1677780020 19:00:20 1677780068
Mar 2 19:01:38 1677780023 19:00:23 1677780068
Mar 2 19:01:39 1677780025 19:00:25 1677780068
Mar 2 19:01:39 1677780028 19:00:28 1677780068
Mar 2 19:01:39 1677780030 19:00:30 1677780068
Mar 2 19:01:39 1677780033 19:00:33 1677780068
Mar 2 19:01:39 1677780035 19:00:35 1677780068
Mar 2 19:01:39 1677780038 19:00:38 1677780068
Mar 2 19:01:39 1677780040 19:00:40 1677780068
Mar 2 19:01:39 1677780043 19:00:43 1677780068
Mar 2 19:01:39 1677780045 19:00:45 1677780068
Mar 2 19:01:39 1677780048 19:00:48 1677780068
Mar 2 19:01:39 1677780050 19:00:50 1677780068
Mar 2 19:01:39 1677780053 19:00:53 1677780068
Mar 2 19:01:39 1677780055 19:00:55 1677780068 19:01:08
Mar 2 19:01:43 1677780057 19:00:57 1677780101 19:01:41
Mar 2 19:01:43 1677780059 19:00:59 1677780101
Mar 2 19:01:43 1677780062 19:01:02 1677780101
Mar 2 19:01:43 1677780064 19:01:04 1677780101
Mar 2 19:01:43 1677780067 19:01:07 1677780101
Mar 2 19:01:43 1677780069 19:01:09 1677780101
Mar 2 19:01:43 1677780072 19:01:12 1677780101
Mar 2 19:01:44 1677780074 19:01:14 1677780101
Mar 2 19:01:44 1677780077 19:01:17 1677780101
Mar 2 19:01:44 1677780079 19:01:19 1677780101 19:01:41
Mar 2 19:02:07 1677780082 19:01:22 1677780117 19:01:57
Mar 2 19:02:08 1677780084 19:01:24 1677780117
Mar 2 19:02:10 1677780087 19:01:27 1677780117 19:01:57
Mar 2 19:02:42 1677780089 19:01:29 1677780153 19:02:33
Mar 2 19:02:43 1677780092 19:01:32 1677780153
Mar 2 19:02:44 1677780094 19:01:34 1677780153
Mar 2 19:02:44 1677780097 19:01:37 1677780153
Mar 2 19:02:44 1677780099 19:01:39 1677780153
Mar 2 19:02:44 1677780102 19:01:42 1677780153
Mar 2 19:02:44 1677780104 19:01:44 1677780153
What driver are you using? What did you do to workaround the problem?
I use the driver WeatherLinkLiveUDP. It receives LOOP messages every 2 to 3 seconds. I have a WeatherLinkLive 6100 device from Davis.
There was no workaround, there was a solution. The base reason was a problem of the RAM of the computer. It resulted in too small memory available. Changing the RAM module solved it.
The problem of too much log messages of the same kind I had myself, when I wrote some code. In that case I set a flag after creating the log message that prevents sending the same log message for some time. Something like this:
if self.last_log_time<(time.time()-300):
logerr(...)
self.last_log_time = time.time()
So the error is logged at most once every archive interval. You could add a counter to log how much occurrences of the error condition took place since the last log message.
Thanks for the information. I think throttling the error messages is probably the way to go, but I am still thinking about shutting MQTTSubscribe down in these situations.
I think the driver/service should be able to recover if the error condition vanishes. Shutting it down would prevent this.
If the driver/service is shut down in case of error, this would require the user to monitor such events manually to do something about it.
I think the driver/service should be able to recover if the error condition vanishes. Shutting it down would prevent this.
If the driver/service is shut down in case of error, this would require the user to monitor such events manually to do something about it.
Makes sense. I will look into some form of throttling the number of messages. Thanks for your thoughts. rich