MQTT.js icon indicating copy to clipboard operation
MQTT.js copied to clipboard

After mqtt client is reconnected, it is unable to continue publishing message

Open spencerfeng opened this issue 2 years ago • 16 comments

Hi there,

I am using the latest version of mqttjs (5.1.3) and this issue happens in old versions as well.

In my app, I received the offline event even when the mosquitto MQTT broker is up and running. After the mqttjs client goes through the process of offline -> closed -> reconnect -> connected. The mqttjs client can no longer publish messages again.

Is there a workaround?

app log: image

mqtt broker log: image

spencerfeng avatar Oct 29 '23 09:10 spencerfeng

Can you provide a script that reproduces the issue?

robertsLando avatar Oct 29 '23 10:10 robertsLando

Hey @robertsLando , we are having same issue in our react application in the fronend this time. the error in the chrome only shows it failed at createWebsocket with array buffer (sry I have not made screenshot for that). the only way to let it work is to close and re-open the chrome broswer in our case.

i think now reactjs and nodejs both has the same issue. Can i grab more attentions on this issue ? Thx.

Screenshot 2023-11-23 at 8 14 42 pm Screenshot 2023-11-23 at 8 14 58 pm

jaketakula avatar Nov 23 '23 23:11 jaketakula

@jaketakula I need to see more details about the error and how to reproduce it. Also what version are you using?

robertsLando avatar Nov 24 '23 07:11 robertsLando

@jaketakula I need to see more details about the error and how to reproduce it. Also what version are you using?

  • the mqttjs version being used is 5.1.3

  • this issue happens randomly in chrome broswer - roughly every1 or 2 weeks seen once. so i think you can easily setup a ping-pong demo app using 5.1.3 and some mqtt broker you like. after that, keepit it running several days and you could see the reconnecion-drop issue.

jaketakula avatar Jan 09 '24 10:01 jaketakula

this issue happens randomly in chrome broswer - roughly every1 or 2 weeks seen once.

you mean 1/2 weeks with the browser opened or what?

robertsLando avatar Jan 10 '24 08:01 robertsLando

The browser never sends data to broker but keeps open and only receive msg from broker.

On Wed, 10 Jan 2024 at 7:02 pm, Daniel Lando @.***> wrote:

this issue happens randomly in chrome broswer - roughly every1 or 2 weeks seen once.

you mean 1/2 weeks with the browser opened or what?

— Reply to this email directly, view it on GitHub https://github.com/mqttjs/MQTT.js/issues/1727#issuecomment-1884358937, or unsubscribe https://github.com/notifications/unsubscribe-auth/BCFQK2J5BDWLWTKXWGOTA33YNZDIDAVCNFSM6AAAAAA6UXDO66VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOBUGM2TQOJTG4 . You are receiving this because you were mentioned.Message ID: <mqttjs/MQTT .@.***>

jaketakula avatar Jan 11 '24 10:01 jaketakula

Could be fixed by #1779 , someone could give a try to 5.3.5?

robertsLando avatar Jan 23 '24 14:01 robertsLando

Thank you very much. I will bump the version now. Will update you later on.

On Wed, 24 Jan 2024 at 1:31 am, Daniel Lando @.***> wrote:

Could be fixed by #1779 https://github.com/mqttjs/MQTT.js/pull/1779 , someone could give a try to 5.3.5?

— Reply to this email directly, view it on GitHub https://github.com/mqttjs/MQTT.js/issues/1727#issuecomment-1906177784, or unsubscribe https://github.com/notifications/unsubscribe-auth/BCFQK2MTWWRAHR2OIE2T65TYP7CVXAVCNFSM6AAAAAA6UXDO66VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMBWGE3TONZYGQ . You are receiving this because you were mentioned.Message ID: <mqttjs/MQTT .@.***>

jaketakula avatar Jan 24 '24 01:01 jaketakula

@jaketakula Thanks! Any news?

robertsLando avatar Jan 24 '24 07:01 robertsLando

It usually take 1 or 2 weeks to see the issue. So pls keep patient. Thx.

On Wed, 24 Jan 2024 at 6:26 pm, Daniel Lando @.***> wrote:

@jaketakula https://github.com/jaketakula Thanks! Any news?

— Reply to this email directly, view it on GitHub https://github.com/mqttjs/MQTT.js/issues/1727#issuecomment-1907537193, or unsubscribe https://github.com/notifications/unsubscribe-auth/BCFQK2OEEP6NCUB5RICMHEDYQCZTTAVCNFSM6AAAAAA6UXDO66VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMBXGUZTOMJZGM . You are receiving this because you were mentioned.Message ID: <mqttjs/MQTT .@.***>

jaketakula avatar Jan 24 '24 07:01 jaketakula

any news on this? we have a similar issue so far, we did upgrade to the latest version but want to make sure before we deploy (iot devices) it won't happen again.

overflowz avatar Feb 09 '24 20:02 overflowz

I didn't checked my own as I never faced this issue, dunno if @jaketakula has news (but I think he would have write if a bug happended). Recent changes fixed a very old bug in reconnect/keep alive that could have caused it BTW

robertsLando avatar Feb 10 '24 09:02 robertsLando

we deployed 10 IoT devices and slowly they're getting disconnected one by one without reconnecting. So the issue still persists.

Here's the config I'm using:

    this.#conn = mqtt.connect(url, {
      cert: CERT_PATH,
      key: KEY_PATH,
      ca: CA_PATH,
      protocolId: "MQTT",
      protocolVersion: 5,
      encoding: "binary",
      clean: false,
      clientId: 'specific-IoT-id',
      keepalive: 60,
      reconnectPeriod: 1000,
      connectTimeout: 30000,
      reschedulePings: false,
      queueQoSZero: true,
      resubscribe: true,
      manualConnect: false,
    });

I also tried to add timeout on publish (since when sending QoS 1 messages, it waits until delivery happens) but fails miserably. pseudo code:

await Promise.race([publish, timer]);
if timeout then client.reconnect();

but after I'm using reconnect() method, every publish request made throws "client disconnecting" error and can't recover from it.

EDIT: can you suggest what shall I do? workaround will also do cause we're in rush right now.

~~EDIT2: I forgot to run npm install :man_facepalming: I'll test again and get back as soon as I have news.~~

Yup, same thing.

overflowz avatar Feb 14 '24 14:02 overflowz

@overflowz Please open a new bug issue and follow the steps in order to also attach DEBUG logs

robertsLando avatar Feb 14 '24 15:02 robertsLando

The problem is, it's hard to reproduce and you gotta wait for days or even weeks for it to trigger (this is for the reconnect). As for the reconnect() -- will do later today.

overflowz avatar Feb 14 '24 15:02 overflowz

I understand that but it's hard for me to know what's going on here without more info,,, what you could do is to also patch the log function in client and print logs to a file so you don't loose them when this happens

robertsLando avatar Feb 14 '24 16:02 robertsLando

I'm not sure if it's related but I'm facing a similar problem.

Context My arch looks like this:

device -- Ethernet connection --> manager -- 5G connection --> MQTT broker
                                     |
                                     L______ 5G connection --> HTTP server

what happens is that every X hours the 5G modem goes down for a short period of time. When this happens I can see in the manager's logs that when it tries to communicate via HTTP it receives a EHOSTUNREACH error, which disappears when the 5G connection is back.

The problem is that when the connection is back the client.connected flips to true but the messages are not being sent.

My sender function looks something like this:

async function sendToBroker(topic, message) {
  if (!this.client.connected) {
    console.warn("Client is not connected, storing message");

    this.storage.store({ topic, message });

    return;
  }

  console.debug("Sending message to broker", message);
  await this.client.publishAsync(topic, message);
}

that is called like this:


async function send(ctx) {
    ctx.call("mqtt.sendToBroker", {
      topic: "topic",
      message: "message",
    }, 
    {
        timeout: 10000, // Throw an error if it takes more than 10 seconds
    });
}

In the logs, I can see Sending message to broker that is followed by a timeout error saying that sendToBroker() did not resolve in 10 seconds. I'm assuming that it gets stuck in the await this.client.publish(topic, message); line.

Here's how I create my client:

this.client = mqtt.connect(url, {
  username: token,
  cert: fs.readFileSync(certPath, "utf8"),
  rejectUnauthorized: false,
});

Question Any idea why this is happening? Is there a way to check if the connection is really up?

Might be vaguely related to: https://github.com/mqttjs/MQTT.js/issues/1825

AndreMaz avatar Apr 26 '24 08:04 AndreMaz

@AndreMaz Could you create a full script that I can use to reproduce the issue? By checking the other issues it seems this happens only when working with tls?

robertsLando avatar Apr 26 '24 10:04 robertsLando

@robertsLando It will be difficult as it's a proprietary code but I'll try to create a repro example

By checking the other issues it seems this happens only when working with tls?

I've been using TLS since the beginning but can't say for sure if it's the source of the problem

AndreMaz avatar Apr 26 '24 11:04 AndreMaz

Can confirm, we're also using with tls, haven't tried otherwise.

overflowz avatar Apr 26 '24 11:04 overflowz

I don't need your source code but a scripts that reproduces the issue. An easy one that connects to a broker with TLS (you can use hivemq public one https://www.hivemq.com/mqtt/public-mqtt-broker/) and then try to reproduce the disconnect and see if the problem happens. I tried last time without success and if I cannot reproduce it on my side it's hard to fix

robertsLando avatar Apr 26 '24 18:04 robertsLando

Yep yep, I know that you don't want it. I've created a simple script that mimics the data-flow but I can't reproduce the issue at home. I've been switching my laptop between WiFi, Ethernet, and 5G hotspot and so far no luck, ie, I did not see the timeout that I've mentioned.

AndreMaz avatar Apr 27 '24 07:04 AndreMaz

if it helps, since we updated to version 5.3.5, we're facing this issue less frequently, but it's still there. I believe these were the the relevant chages that could've affected it #1779

but in the latest release, there are fixes for the possible race condition again: #1848 but it's hard for us to keep updating many iot devices since we have to do a npm install where connection is very unstable.

these issues are really hard for us to test in a production environment and it also costs us a lot. is there any version that is considered stable so we can pick it for now until these issues gets fixed?

thank you!

overflowz avatar Apr 27 '24 09:04 overflowz

I also think that this is a regression that was introduced in the latest releases (don't know which one tho)

I'll have to check my previous releases and test them out. Will keep you guys updated

AndreMaz avatar Apr 27 '24 10:04 AndreMaz

I'm sorry for the issues guys and I would like to help you but it's hard to guess what could be the root cause here, we should firstly try to find out an easy way to reproduce the issue somehow

robertsLando avatar Apr 27 '24 15:04 robertsLando

I'm sorry for the issues guys and I would like to help you but it's hard to guess what could be the root cause here, we should firstly try to find out an easy way to reproduce the issue somehow

No worries at all! We're devs too and we understand the frustration :-) Just to be clear, I didn't mean it as a "fix it asap", I do appreciate the work the maintainers are doing, truly. I was asking if there are any old version(s) I could try so I won't get pressured by the company to fix a problem that is hard to explain why it does not work sometimes xD

Regardless, I really do wish to help somehow too, but it's really hard to reproduce :( We're currently running the code on 50 devices and it might happen once a week, two weeks or even months per one device, it's really unpredictable and random.

overflowz avatar Apr 27 '24 19:04 overflowz

I'm the only maintainer here unfortunately and I started because I use this package in almost all my projects and I wanted to help keeping it maintained (as it was almost died)

Based on the first comment of this issue seems this was happening also with older versions so I dunno, I'm sorry :(

robertsLando avatar Apr 29 '24 07:04 robertsLando

Hey @robertsLando no need to apologize. Huge kudos to you for what you're doing :muscle:

I'll have to check my previous releases and test them out.

Just checked, I went from 5.3.3 -> 5.5.2 -> 5.5.4

I'm rolling back to 5.3.3 and going to let it run for a while. Will keep you updated

AndreMaz avatar Apr 29 '24 07:04 AndreMaz

@AndreMaz Thanks!

robertsLando avatar Apr 29 '24 07:04 robertsLando

Hey @robertsLando no need to apologize. Huge kudos to you for what you're doing 💪

I'll have to check my previous releases and test them out.

Just checked, I went from 5.3.3 -> 5.5.2 -> 5.5.4

I'm rolling back to 5.3.3 and going to let it run for a while. Will keep you updated

if it helps, we were using 5.3.3 previously and the issue was appearing more frequently than now (about 3-4 times a week).

overflowz avatar Apr 29 '24 08:04 overflowz