matrix-appservice-irc icon indicating copy to clipboard operation
matrix-appservice-irc copied to clipboard

Double message triggering encoding fallback randomly

Open vranki opened this issue 5 years ago • 7 comments

This happens very randomly but managed to see it twice in short time. Encoding fallback is enabled in this network.

  • User sends a message in UTF-8 on IRC
  • Bridge sends it correctly once ä = ä
  • Bridge sends it again, this time with unnecessary latin1 -> UTF-8 conversion ä = À

What's happening here? Some kind of race condition? Could there be two threads in the encoding function at the same time? Does node-irc claim to be thread safe? Needs to be studied.

vranki avatar May 12 '20 12:05 vranki

At a guess, node-irc is occasionally encoding a message differently. The IRC Bridge has many client connections at once, so it's normal for potentially 100s of the same message to arrive, but it should be deduped based on the contents of the message. If the message is occasionally being decoded wrongly, then that would explain this bug.

Half-Shot avatar May 12 '20 12:05 Half-Shot

Possibly related: https://github.com/matrix-org/synapse/issues/3365

vranki avatar May 14 '20 07:05 vranki

Presumably this got fixed with #1081

Half-Shot avatar Aug 25 '20 08:08 Half-Shot

It did not.

Half-Shot avatar Aug 26 '20 09:08 Half-Shot

For any reason we seem to hit this bug more often with ircnet bridge now running 0.24.0, or at least it felt this way to come here crying about it :) Still quite random occurrence though.

image

Wrongly doubled lines Event Sources, should they be any help:

{
  "content": {
    "body": "ei noista saada kaukolämpöä, liian vähän lämpöeroa",
    "msgtype": "m.text"
  },
  "origin_server_ts": 1614768303617,
  "sender": "@_ircnet_<redacted>:irc.snt.utwente.nl",
  "type": "m.room.message",
  "unsigned": {
    "age": 355
  },
  "event_id": "$dZOd65iODgWEw5TSvxNr76NfJJsaf_ancnPguKcj6ZM",
  "room_id": "!OoQtBHsGJuXgqHIska:hacklab.fi"
}
{
  "content": {
    "body": "ei noista saada kaukolÀmpöÀ, liian vÀhÀn lÀmpöeroa",
    "msgtype": "m.text"
  },
  "origin_server_ts": 1614768303794,
  "sender": "@_ircnet_<redacted>:irc.snt.utwente.nl",
  "type": "m.room.message",
  "unsigned": {
    "age": 3113
  },
  "event_id": "$AmfdLTNDQz2iTBVTuQhsp_P7mmM-hmdega4tai9EJj4",
  "room_id": "!OoQtBHsGJuXgqHIska:hacklab.fi"
}
{
  "content": {
    "body": "ja se pelkkä laskennan sähkönkulutus on ihan posketon noissa kryptovaluutoissa",
    "msgtype": "m.text"
  },
  "origin_server_ts": 1614768351990,
  "sender": "@_ircnet_<redacted>:irc.snt.utwente.nl",
  "type": "m.room.message",
  "unsigned": {
    "age": 1121
  },
  "event_id": "$Di9rrsY7iB-bipz-l47ouBqqqOCTjr6tndKvD6wigMc",
  "room_id": "!OoQtBHsGJuXgqHIska:hacklab.fi"
}
{
  "content": {
    "body": "ja se pelkkÀ laskennan sÀhkönkulutus on ihan posketon noissa kryptovaluutoissa",
    "msgtype": "m.text"
  },
  "origin_server_ts": 1614768352113,
  "sender": "@_ircnet_<redacted>:irc.snt.utwente.nl",
  "type": "m.room.message",
  "unsigned": {
    "age": 5305
  },
  "event_id": "$8dju4ajGmWFJQCrZv7wX4NLjmjlQIcWhNw5x2rp1qXA",
  "room_id": "!OoQtBHsGJuXgqHIska:hacklab.fi"
}

olmari avatar Mar 03 '21 11:03 olmari

Apparently occasionally UTF-8 encoding is borked M->I direction too (as it is I->M at times), and seemingly so that at least in that case something splits messages right in between multi-byte character. This happened on Freenode room:

{
  "type": "m.room.message",
  "sender": "@<redacted>:hacklab.fi",
  "content": {
    "msgtype": "m.text",
    "body": "Ja ehkä se, miten sitä suunnitellaan, vaikka olenkin varsinkin myöhempien C++-standardien (kaikki C++11->) fani. Niissä ei sinänsä ole tullut mitään, mistä voisin sanoa, että ei tullut tarpeeseen. Mutta ne on silti usein vähän sellaisia minimiratkaisuja ongelmiin. Mun mielestä tosi oireellista on, että kun keksittiin, että tässä kielessä on käytännössä vahingossa tehty mahdolliseksi tehdä templateilla mielivaltaista ajonaikaista laskentaa, sitä lähdettiin laajentamaan sen mahdollistamiseksi sen sijaan, että siihen olisi _suunniteltu_ ajonaikaista laskentaa tukeva kieli sen sijaan, että olisi lähdetty parantelemaan evoluution tuotosta.",
    "external_url": "https://t.me/c/1459647690/40118",
    "net.maunium.telegram.puppet": true
  },
  "origin_server_ts": 1615289228734,
  "unsigned": {
    "age": 116
  },
  "event_id": "$KsE3fjOR3v5geTn0mKnfIG7QzzKGDDAYwlevEIzLsKI",
  "room_id": "!OwkWEsdHkzXRAQkApj:hacklab.fi"
}

So this apparently relates, or can relate to, very much to message splitter not realising UTF-8 in the cutting of message, etc...

olmari avatar Mar 09 '21 17:03 olmari

Still happening today. It seems less frequent but I can see it perhaps daily.

vranki avatar Sep 13 '24 10:09 vranki