telegram-history-dump icon indicating copy to clipboard operation
telegram-history-dump copied to clipboard

Message ids are not sequential

Open lgrn opened this issue 7 years ago • 3 comments

When dumping a conversation, I get a WARN saying Message ids are not sequential.

Googling this warning, there aren't any hits except for the source code itself, so I thought I'd ask what this means. It's fairly straight forward what the code does:

      if msg_id && prev_msg_id && msg_id >= prev_msg_id
        $log.warn('Message ids are not sequential (%s[%s] -> %s[%s])' % [
          prev_msg_id.raw_hex, prev_msg_id.sequence_hex,
          msg_id.raw_hex, msg_id.sequence_hex,
        ])

So I guess my question is, how should this error be interpreted? For example, is it normal and expected in chats where messages have been deleted? For a while now, Telegram has supported deleting messages on both ends within a short period, presumably this could lead to an archive containing non-sequential messages.

If my above assumption is correct, maybe it would help to explain in the warning that this could be a cause, and doesn't necessarily indicate an issue with the dump.

lgrn avatar Mar 29 '17 07:03 lgrn

I'm getting that warning when someone is posting to the group when I'm dumping it.

MelomanCool avatar Jul 11 '17 20:07 MelomanCool

The question remains whether there is a point in checking for it. If it's completely normal and very rarely indicates a problem, should the user be alerted at all? If my guess is correct that with the introduction of the "delete on both ends" functionality in Telegram, and anyone can now create this "issue", it seems to me like something that should be ignored.

lgrn avatar Jul 22 '17 08:07 lgrn

To give some background on this, this warning was added with e2ca740 as a sanity check for aff1af1, where the freshness checking (whether a downloaded message is new or already dumped locally) was reworked because breaking telegram-cli changes necessitated this.

The code you quote does indeed look straightforward, but the deceptively simple some_id > some_other_id hides something that is all but simple. Because of some (in my opinion) extremely sloppy changes in how telegram-cli creates message IDs, they are not sortable anymore, and their structure is dependent on system architecture (i.e. endianness and 32/64 bit) and C compiler. The MsgId code (which is the type of msg_id and prev_msg_id) implements a custom comparison which extracts and manipulates a specific portion of the ID to get something sortable, at the cost of having to make certain assumptions about the target system. Which was something that I really wanted to avoid, but couldn't find a better alternative. The assumptions I had to make should hold true for pretty much anyone using this in the foreseeable future, but as you can probably understand I felt the need to build in some assertions and sanity checks, and this warning was one of those.

When I wrote this it was under the assumption that message IDs are always expected to be sequential when the IDs are decoded correctly, but apparently there are cases where this does not hold true. I don't really get how deleting messages could cause this though, as deleting messages should only cause gaps in message IDs and not reorder them. Or am I missing something here?

I can explain the observation of @MelomanCool though. I used to have a note about this in the readme but apparently that got lost in the transition of telegram-json-backup to telegram-history-dump. It reads:

Because the message backlogs are received in chunks from newest to oldest, the arrival of new messages while the backup is running may break index consistency and therefore cause duplicate or missing messages in the resulting dump. I recommend running this at a time when it's unlikely that anyone will send a message to your backup target(s). You could even schedule the backup in the middle of the night with at or crontab.

(Note that this is not entirely accurate, it has more to do with the fact that the message downloading is offset/limit based rather than the fact that it's from new to old, and when I think about it, it could cause duplicates but not missing messages.)

So in this case the warning probably means that there are one or more duplicate messages somewhere as a result of new messages being posted during the dump. This note was from a time where I didn't even have incremental dumps and freshness checking and pretty much forgot about it since. I can probably introduce a buffer of received message IDs and check every new message against that to prevent these kinds of duplicates.

@lgrn Could this explain your case as well or can you rule out that this happens only when new messages are being posted during the dump?

tvdstaaij avatar Jul 22 '17 09:07 tvdstaaij