teslamateapi icon indicating copy to clipboard operation
teslamateapi copied to clipboard

lost mqtt connexion?

Open vbarrier opened this issue 3 years ago • 3 comments

Hi! thank you for this great addon :)

However, I have a problem since I use it :(. It seems that the backend loose the connection with the Mosquitto server or doesn't not refresh the cached data.

I'm polling the api/v1/cars/*/status every 1 minutes to populate a tracking database.

vbarrier avatar May 10 '22 14:05 vbarrier

I can confirm that of the docker container fix the problem. Any insight on what happen is happening?

vbarrier avatar May 11 '22 08:05 vbarrier

Hi @vbarrier, What version of TeslaMateApi and Mosquitto are you using in your setup? Has this issue appeared after you have done some upgrade? You you see some suspicious entries in the container logs?

I can confirm that of the docker container fix the problem. Any insight on what happen is happening?

What you mean with docker container fix?

tobiasehlert avatar May 11 '22 09:05 tobiasehlert

Hi I'm using the latest version of TeslMateApi (1.14) and MQTT version is 2.

It seems that the api lost connexion or doesn't take any more mqtt updates..

1652680291: Client teslamateapi-Lv85 closed its connection.
1652680293: New connection from 172.18.0.11:53518 on port 1883.
1652680293: New client connected from 172.18.0.11:53518 as teslamateapi-Jp2A (p2, c1, k2).

About docker container, I mean, that a restart of the docker container fix the issue for couple hours..

vbarrier avatar May 16 '22 07:05 vbarrier

Hi @vbarrier,

have you done some more debugging for this issue? I currently experiencing the exact same behavior.

virusbrain avatar Sep 19 '22 14:09 virusbrain

Hi, still facing the bug I ended up restarting the container every 4 hours. :(

vbarrier avatar Sep 19 '22 15:09 vbarrier

I looks like connections should be re-established... https://github.com/tobiasehlert/teslamateapi/blob/27d204457d385c9d8d8cbe64b38445785896a0f4/src/v1_TeslaMateAPICarsStatus.go#L134-L135

but also.. it looks like there are some some handlers that could be better utilized at a minimum for the purpose of logging... for example.... https://pkg.go.dev/github.com/eclipse/paho.mqtt.golang#ClientOptions.SetConnectionLostHandler

... again.. might be worth adding some logging.. then we may be able to better determine the nature of the failure.

LelandSindt avatar Sep 19 '22 21:09 LelandSindt

I have forked the project and tried to add some debugging output as suggested. This version is running now inside a docker container. We will see if the problem occurs again, then I will keep you informed. If you look in my forked sourcecode you may see that I am not a go developer ;-).

virusbrain avatar Sep 21 '22 13:09 virusbrain

I have forked the project and tried to add some debugging output as suggested. This version is running now inside a docker container. We will see if the problem occurs again, then I will keep you informed. If you look in my forked sourcecode you may see that I am not a go developer ;-).

The problem occurred again. For me it looks like there is no MQTT issue. I cannot see any log entries regarding lost connections or reconnecting. But if I call the status api end point I get:

teslamateapi_1 | 2022/09/21 16:27:00.611535 [debug] TeslaMateAPICarsStatusV1 - (/api/v1/cars/1/status) returned data: teslamateapi_1 | 2022/09/21 16:27:00.611642 [debug] {"data":{"car":{"car_id":1,"car_name":"XXXXXX"},"status":{"display_name":"","state":"suspended", [...]

In the TeslaMate UI I see the state as "online" not "suspended".

I have tried to restart the my MQTT-Server (mosquitto in this case) and got the following messages in the logfile:

teslamateapi_1 | 2022/09/21 16:25:55.177510 [error] MQTT connection lost: %s teslamateapi_1 | 2022/09/21 16:25:55.177578 [info] mqtt reconnecting... teslamateapi_1 | 2022/09/21 16:25:56.209363 [info] mqtt reconnecting...

So the "extended logging" seems to work. Maybe there is a problem with the cache?

virusbrain avatar Sep 21 '22 14:09 virusbrain

I have tried to restart the my MQTT-Server (mosquitto in this case) and got the following messages in the logfile:

teslamateapi_1 | 2022/09/21 16:25:55.177510 [error] MQTT connection lost: %s teslamateapi_1 | 2022/09/21 16:25:55.177578 [info] mqtt reconnecting... teslamateapi_1 | 2022/09/21 16:25:56.209363 [info] mqtt reconnecting...

So the "extended logging" seems to work. Maybe there is a problem with the cache?

Another example: teslamateapi_1 | 2022/09/27 01:30:03.468099 [error] MQTT connection lost: pingresp not received, disconnecting teslamateapi_1 | 2022/09/27 01:30:03.468264 [info] mqtt reconnecting...

Maybe we have to "resubscribe" the topics after a reconnect? Does anybody know?

virusbrain avatar Sep 27 '22 06:09 virusbrain

I also think this is a cache problem somewhere. I'm querying the teslamate-api every minutes and after sometime not working anymore. The restart thing remove all cache.. I will try to have a mirror setup as I don't want to break my current production setup.

vbarrier avatar Sep 28 '22 18:09 vbarrier

I have another point found and currently do some tests with this patch implemented: https://github.com/virusbrain/teslamateapi/commit/3ce7126fecb50de3b9d03eb92a1aedbe136238b2

Test is currently ongoing and as I am not a MQTT expert I am not sure which side effects this patch would produce... But if it will help I may be able to make the code a bit more fancy and create a pull request for this.

virusbrain avatar Sep 28 '22 20:09 virusbrain

https://github.com/virusbrain/teslamateapi/commit/3ce7126fecb50de3b9d03eb92a1aedbe136238b2#r85320141

If I read your response correctly, I should be able to re-create this scenario by restarting the MQTT service, is that correct?

LelandSindt avatar Sep 29 '22 13:09 LelandSindt

virusbrain@3ce7126#r85320141

If I read your response correctly, I should be able to re-create this scenario by restarting the MQTT service, is that correct?

This should be the case.

As already mentioned I do currently running a version with opts.SetCleanSession(false) set. Because I guess the subscriptions get lost when a reconnect is necessary. While googeling I found this which - I think - goes in the same direction of this bug: https://github.com/emitter-io/go/pull/18

virusbrain avatar Sep 29 '22 13:09 virusbrain

As already mentioned I do currently running a version with opts.SetCleanSession(false) set. Because I guess the subscriptions get lost when a reconnect is necessary. While googeling I found this which - I think - goes in the same direction of this bug: emitter-io/go#18

Just FYI: With SetCleanSession(false) my Docker container runs for about 72h without any problems. All data seems to be refreshed correctly.

So from my point of view we do have two possibilities:

  1. Set SetCleanSession to false (as I have done with https://github.com/virusbrain/teslamateapi/commit/3ce7126fecb50de3b9d03eb92a1aedbe136238b2)
  2. Find a way to resubscribe to the mqtt topcis on a reconnect.

virusbrain avatar Oct 01 '22 14:10 virusbrain

This is really good information, thank you for taking the time to troubleshoot and document.

I will see what I can do to make time this next week to dig in... I am inclined to say we should certainly be re-subscribing on re-connect... I would need to read up more on SetCleanSession.

LelandSindt avatar Oct 01 '22 14:10 LelandSindt

After reviewing/testing some... it looks like opts.SetCleanSession(false) is the simplest solution. I would like to do a little more testing (likely tomorrow while my wife drives to/from work) and some more reading to confirm/support the change.

LelandSindt avatar Oct 05 '22 02:10 LelandSindt

also considering returning 500 when mqtt is not connected...

https://github.com/LelandSindt/teslamateapi/commit/8ecda10c8766cd4c4e97e48ffffe6579033a217a

LelandSindt avatar Oct 05 '22 03:10 LelandSindt

Test went well this morning. I was able to interrupt the connection between teslamateapi and mosquitto see the connection loss, re-connection and new data flow. I think that the 500 on connection loss provides value but I am open to feedback.

opts.SetCleanSession(false) only fixes situations where the connection is lost but mosquitto maintains state. If mosquitto is restarted and looses state, when teslamateapi re-connects its subscriptions will have been lost and no new updates will be sent... :thinking:

LelandSindt avatar Oct 05 '22 13:10 LelandSindt

Test went well this morning. I was able to interrupt the connection between teslamateapi and mosquitto see the connection loss, re-connection and new data flow. I think that the 500 on connection loss provides value but I am open to feedback.

opts.SetCleanSession(false) only fixes situations where the connection is lost but mosquitto maintains state. If mosquitto is restarted and looses state, when teslamateapi re-connects its subscriptions will have been lost and no new updates will be sent... 🤔

Aren't we able to simply resubscribe to the mqtt topics after a reconnect occured? According to my understandig we just have to refactor the subscription part (https://github.com/LelandSindt/teslamateapi/blob/96d53a66dd30574a1f1d15a4d86f77766d9b5018/src/v1_TeslaMateAPICarsStatus.go#L151-L156) somehow that it can subscribe to the topics in a reconnecting-handler. Or do I miss something?

virusbrain avatar Oct 05 '22 14:10 virusbrain

I started down the re-subscribe path last night, but could not quite grok how it would best be done cleanly.

... I will have another look at it tonight.

LelandSindt avatar Oct 05 '22 15:10 LelandSindt

After reviewing this.. https://github.com/eclipse/paho.mqtt.golang/issues/22

I made these changes... https://github.com/LelandSindt/teslamateapi/commit/59784a727807646fefd5ef28e7338f0dd47c71f5 they look good in simple/synthetic tests, I will test again in the morning.

LelandSindt avatar Oct 06 '22 02:10 LelandSindt

After reviewing this.. eclipse/paho.mqtt.golang#22

I made these changes... LelandSindt@59784a7 they look good in simple/synthetic tests, I will test again in the morning.

Yeah, on the first look this looks promising! I will setup a docker container to test this too.

virusbrain avatar Oct 06 '22 08:10 virusbrain

Thank you for your time & testing

I will do this too this evening

vbarrier avatar Oct 06 '22 10:10 vbarrier

watch "curl -s localhost:8080/api/v1/cars/1/status | jq '.data.status.odometer' "

kubectl --namespace mosquitto delete pod mosquitto-56dc565797-m6mv8

[GIN] 2022/10/06 - 07:22:07 | 200 |    8.492773ms |       127.0.0.1 | GET      "/api/v1/cars/1/status"     
2022/10/06 07:22:07.572515 [error] MQTT connection lost: EOF                                                                                                                                                              
2022/10/06 07:22:07.572517 [info] mqtt reconnecting...                                                                                                                                                                    
2022/10/06 07:22:07.572543 [info] mqtt connecting...                                                         
2022/10/06 07:22:08.581626 [info] mqtt reconnecting... 
2022/10/06 07:22:08.581692 [info] mqtt connecting...
2022/10/06 07:22:09.149964 [notice] TeslaMateAPICarsStatusV1 mqtt is disconnected.. can not return status for car without mqtt!
2022/10/06 07:22:09.149994 [info] TeslaMateAPICarsStatusV1 - (/api/v1/cars/1/status) executed successfully.
[GIN] 2022/10/06 - 07:22:09 | 500 |     103.194µs |       127.0.0.1 | GET      "/api/v1/cars/1/status"
2022/10/06 07:22:10.583033 [info] mqtt reconnecting... 
2022/10/06 07:22:10.583079 [info] mqtt connecting...
...
2022/10/06 07:22:21.414245 [notice] TeslaMateAPICarsStatusV1 mqtt is disconnected.. can not return status for car without mqtt!                                                                                           
2022/10/06 07:22:21.414260 [info] TeslaMateAPICarsStatusV1 - (/api/v1/cars/1/status) executed successfully.                                                                                                               
[GIN] 2022/10/06 - 07:22:21 | 500 |      52.859µs |       127.0.0.1 | GET      "/api/v1/cars/1/status"                                                                                                                    
2022/10/06 07:22:22.585712 [info] mqtt reconnecting...                                                                                                                                                                    
2022/10/06 07:22:22.585733 [info] mqtt connecting...                                                                                                                                                                      
2022/10/06 07:22:22.654997 [info] mqtt connected...                                                                                                                                                                       
2022/10/06 07:22:22.659715 [info] subscribed to: teslamate/cars/#                                                                                                                                                         
2022/10/06 07:22:23.470383 [info] TeslaMateAPICarsStatusV1 - (/api/v1/cars/1/status) executed successfully.                                                                                                               
[GIN] 2022/10/06 - 07:22:23 | 200 |   23.586126ms |       127.0.0.1 | GET      "/api/v1/cars/1/status"                                                                                                                    
2022/10/06 07:22:25.489866 [info] TeslaMateAPICarsStatusV1 - (/api/v1/cars/1/status) executed successfully.

After K8s started a new mosquitto pod teslamateapi re-connected/re-subscribed and curl | jq ... displayed updated odometer readings as my wife drove to work this morning... (I miss driving my car to work.. but I don't miss driving to work... )

looks good to me, y'all have any feedback or concerns?

LelandSindt avatar Oct 06 '22 12:10 LelandSindt

@virusbrain do you want the honour of issuing the PR? --- if not let me know, I can put it together right quick.

LelandSindt avatar Oct 06 '22 21:10 LelandSindt

@virusbrain do you want the honour of issuing the PR? --- if not let me know, I can put it together right quick.

I would really be happy issuing the PR. I try to cleanup the MQTT_ClientID stuff and then open the PR. I think I have some time this evening.

virusbrain avatar Oct 07 '22 05:10 virusbrain

specific to MQTT_CLIENTID I should have mentioned I cleaned that up last night... https://github.com/tobiasehlert/teslamateapi/commit/e649d38d40cc40bf1b932ebf8f96fb2fd13eeca2

Let me know if you have any questions/issues pulling my commits/changes in!

LelandSindt avatar Oct 07 '22 12:10 LelandSindt