thin-edge.io Connecting to MQTT broker spins when invalid authentication config

Describe the bug

When using MQTT server authentication and invalid CA certificate was provided in tedge-config, opening connection spins on the error to verify server certificate:

2024-02-19T10:39:23.582843165Z ERROR mqtt_channel::connection: MQTT: failed to connect to broker at 'localhost:8883': TLS: I/O: invalid peer certificate: BadSignature

generating a lot of logs and wasting a lot of CPU.

To Reproduce

Create file /etc/tedge/mosquitto-conf/~secure-listener.conf [^1] with following contents:

listener 8883
allow_anonymous true
require_certificate false
cafile   /etc/mosquitto/ca_certificates/ca.crt
certfile /etc/mosquitto/ca_certificates/server.crt
keyfile  /etc/mosquitto/ca_certificates/server.key

Run gen-certs.sh, which will generate and place a server certificate into /etc/mosquitto/ca_certificates tests/RobotFramework/tests/mqtt/gen_certs.sh

Generate a different CA certificate which will be used by tedge to try to verify broker certificate and use it as mqtt.client.auth.ca_file

openssl req \
    -new \
    -x509 \
    -days 7 \
    -extensions v3_ca \
    -nodes \
    -subj "/C=US/ST=Denial/L=Springfield/O=Dis/CN=ca" \
    -keyout ca.key \
    -out ca.crt

tedge config set mqtt.client.auth.ca_file $(realpath ca.crt)

Ensure all thin-edge is disconnected, or tedge services are off, and restart mosquitto.
Try to start a tedge service, e.g. tedge-agent
A lot of messages like 2024-02-19T10:39:23.54226273Z ERROR mqtt_channel::connection: MQTT: failed to connect to broker at 'localhost:8883': TLS: I/O: invalid peer certificate: BadSignature should be generated very quickly.

Expected behavior

On failures to connect, a reasonable delay should be used, e.g. 1s after every attempt.

Additional context

I could just do a quick PR adding a delay when this particular error case, but I see that we have a match where we decide which errors should be given a timeout and which should not:

https://github.com/thin-edge/thin-edge.io/blob/64def4e4c5ed8e600d3ab090725ad3af57f1579f/crates/common/mqtt_channel/src/connection.rs#L308-L315

So I have a question: what's the reason that we apply a delay only to some of the connect errors? Should any other cases be added other than BadSignature case which this ticket is about? Why not apply the delay to all errors when trying to connect?

[^1]: ~ is prepended to the filename so this file is loaded after the one with per_listener_settings true directive, without it mosquitto fails to start with an error. I guess we could perhaps use the xx-filename.conf convention for mosquitto configs to more precisely control the load order of configuration files

Feb 19 '24 11:02 Bravo555

So I have a question: what's the reason that we apply a delay only to some of the connect errors? Should any other cases be added other than BadSignature case which this ticket is about? Why not apply the delay to all errors when trying to connect?

There is no good reason for that beyond that a delay has been introduced for likely transient errors (broker not started or restarting / network hiccup).

In case of a TlsError or worse a StateError, which is unlikely a transient error, we should consider to give up with a fatal error.

Feb 19 '24 17:02 didier-wenzek

In case of a TlsError or worse a StateError, which is unlikely a transient error, we should consider to give up with a fatal error.

I agree, the most important thing is to either exit with a fatal error, or apply a reasonable timeout and retry, and never do the third thing, i.e. retry immediately, because it results in spinning. So for any unknown errors, we need to decide whether we retry after delay, or exit. IMO the best would be to exit on all errors, unless it's some transient error, then we can apply a delay and retry.

Feb 22 '24 11:02 Bravo555

QA has thoroughly checked the bug and here are the results:

[ ] Test for ticket exists in the test suite.
[x] QA has tested the bug and it's not reproducable anymore

Apr 24 '24 06:04 gligorisaev

thin-edge.io thin-edge.io copied to clipboard

Connecting to MQTT broker spins when invalid authentication config

thin-edge.io
thin-edge.io copied to clipboard