Double message triggering encoding fallback randomly
This happens very randomly but managed to see it twice in short time. Encoding fallback is enabled in this network.
- User sends a message in UTF-8 on IRC
- Bridge sends it correctly once ä = ä
- Bridge sends it again, this time with unnecessary latin1 -> UTF-8 conversion ä = À
What's happening here? Some kind of race condition? Could there be two threads in the encoding function at the same time? Does node-irc claim to be thread safe? Needs to be studied.
At a guess, node-irc is occasionally encoding a message differently. The IRC Bridge has many client connections at once, so it's normal for potentially 100s of the same message to arrive, but it should be deduped based on the contents of the message. If the message is occasionally being decoded wrongly, then that would explain this bug.
Possibly related: https://github.com/matrix-org/synapse/issues/3365
Presumably this got fixed with #1081
It did not.
For any reason we seem to hit this bug more often with ircnet bridge now running 0.24.0, or at least it felt this way to come here crying about it :) Still quite random occurrence though.

Wrongly doubled lines Event Sources, should they be any help:
{
"content": {
"body": "ei noista saada kaukolämpöä, liian vähän lämpöeroa",
"msgtype": "m.text"
},
"origin_server_ts": 1614768303617,
"sender": "@_ircnet_<redacted>:irc.snt.utwente.nl",
"type": "m.room.message",
"unsigned": {
"age": 355
},
"event_id": "$dZOd65iODgWEw5TSvxNr76NfJJsaf_ancnPguKcj6ZM",
"room_id": "!OoQtBHsGJuXgqHIska:hacklab.fi"
}
{
"content": {
"body": "ei noista saada kaukolÀmpöÀ, liian vÀhÀn lÀmpöeroa",
"msgtype": "m.text"
},
"origin_server_ts": 1614768303794,
"sender": "@_ircnet_<redacted>:irc.snt.utwente.nl",
"type": "m.room.message",
"unsigned": {
"age": 3113
},
"event_id": "$AmfdLTNDQz2iTBVTuQhsp_P7mmM-hmdega4tai9EJj4",
"room_id": "!OoQtBHsGJuXgqHIska:hacklab.fi"
}
{
"content": {
"body": "ja se pelkkä laskennan sähkönkulutus on ihan posketon noissa kryptovaluutoissa",
"msgtype": "m.text"
},
"origin_server_ts": 1614768351990,
"sender": "@_ircnet_<redacted>:irc.snt.utwente.nl",
"type": "m.room.message",
"unsigned": {
"age": 1121
},
"event_id": "$Di9rrsY7iB-bipz-l47ouBqqqOCTjr6tndKvD6wigMc",
"room_id": "!OoQtBHsGJuXgqHIska:hacklab.fi"
}
{
"content": {
"body": "ja se pelkkÀ laskennan sÀhkönkulutus on ihan posketon noissa kryptovaluutoissa",
"msgtype": "m.text"
},
"origin_server_ts": 1614768352113,
"sender": "@_ircnet_<redacted>:irc.snt.utwente.nl",
"type": "m.room.message",
"unsigned": {
"age": 5305
},
"event_id": "$8dju4ajGmWFJQCrZv7wX4NLjmjlQIcWhNw5x2rp1qXA",
"room_id": "!OoQtBHsGJuXgqHIska:hacklab.fi"
}
Apparently occasionally UTF-8 encoding is borked M->I direction too (as it is I->M at times), and seemingly so that at least in that case something splits messages right in between multi-byte character. This happened on Freenode room:
{
"type": "m.room.message",
"sender": "@<redacted>:hacklab.fi",
"content": {
"msgtype": "m.text",
"body": "Ja ehkä se, miten sitä suunnitellaan, vaikka olenkin varsinkin myöhempien C++-standardien (kaikki C++11->) fani. Niissä ei sinänsä ole tullut mitään, mistä voisin sanoa, että ei tullut tarpeeseen. Mutta ne on silti usein vähän sellaisia minimiratkaisuja ongelmiin. Mun mielestä tosi oireellista on, että kun keksittiin, että tässä kielessä on käytännössä vahingossa tehty mahdolliseksi tehdä templateilla mielivaltaista ajonaikaista laskentaa, sitä lähdettiin laajentamaan sen mahdollistamiseksi sen sijaan, että siihen olisi _suunniteltu_ ajonaikaista laskentaa tukeva kieli sen sijaan, että olisi lähdetty parantelemaan evoluution tuotosta.",
"external_url": "https://t.me/c/1459647690/40118",
"net.maunium.telegram.puppet": true
},
"origin_server_ts": 1615289228734,
"unsigned": {
"age": 116
},
"event_id": "$KsE3fjOR3v5geTn0mKnfIG7QzzKGDDAYwlevEIzLsKI",
"room_id": "!OwkWEsdHkzXRAQkApj:hacklab.fi"
}
So this apparently relates, or can relate to, very much to message splitter not realising UTF-8 in the cutting of message, etc...
Still happening today. It seems less frequent but I can see it perhaps daily.