telegram-history-dump
telegram-history-dump copied to clipboard
Message ids are not sequential
When dumping a conversation, I get a WARN
saying Message ids are not sequential
.
Googling this warning, there aren't any hits except for the source code itself, so I thought I'd ask what this means. It's fairly straight forward what the code does:
if msg_id && prev_msg_id && msg_id >= prev_msg_id
$log.warn('Message ids are not sequential (%s[%s] -> %s[%s])' % [
prev_msg_id.raw_hex, prev_msg_id.sequence_hex,
msg_id.raw_hex, msg_id.sequence_hex,
])
So I guess my question is, how should this error be interpreted? For example, is it normal and expected in chats where messages have been deleted? For a while now, Telegram has supported deleting messages on both ends within a short period, presumably this could lead to an archive containing non-sequential messages.
If my above assumption is correct, maybe it would help to explain in the warning that this could be a cause, and doesn't necessarily indicate an issue with the dump.
I'm getting that warning when someone is posting to the group when I'm dumping it.
The question remains whether there is a point in checking for it. If it's completely normal and very rarely indicates a problem, should the user be alerted at all? If my guess is correct that with the introduction of the "delete on both ends" functionality in Telegram, and anyone can now create this "issue", it seems to me like something that should be ignored.
To give some background on this, this warning was added with e2ca740 as a sanity check for aff1af1, where the freshness checking (whether a downloaded message is new or already dumped locally) was reworked because breaking telegram-cli changes necessitated this.
The code you quote does indeed look straightforward, but the deceptively simple some_id > some_other_id
hides something that is all but simple. Because of some (in my opinion) extremely sloppy changes in how telegram-cli creates message IDs, they are not sortable anymore, and their structure is dependent on system architecture (i.e. endianness and 32/64 bit) and C compiler. The MsgId code (which is the type of msg_id and prev_msg_id) implements a custom comparison which extracts and manipulates a specific portion of the ID to get something sortable, at the cost of having to make certain assumptions about the target system. Which was something that I really wanted to avoid, but couldn't find a better alternative. The assumptions I had to make should hold true for pretty much anyone using this in the foreseeable future, but as you can probably understand I felt the need to build in some assertions and sanity checks, and this warning was one of those.
When I wrote this it was under the assumption that message IDs are always expected to be sequential when the IDs are decoded correctly, but apparently there are cases where this does not hold true. I don't really get how deleting messages could cause this though, as deleting messages should only cause gaps in message IDs and not reorder them. Or am I missing something here?
I can explain the observation of @MelomanCool though. I used to have a note about this in the readme but apparently that got lost in the transition of telegram-json-backup to telegram-history-dump. It reads:
Because the message backlogs are received in chunks from newest to oldest, the arrival of new messages while the backup is running may break index consistency and therefore cause duplicate or missing messages in the resulting dump. I recommend running this at a time when it's unlikely that anyone will send a message to your backup target(s). You could even schedule the backup in the middle of the night with at or crontab.
(Note that this is not entirely accurate, it has more to do with the fact that the message downloading is offset/limit based rather than the fact that it's from new to old, and when I think about it, it could cause duplicates but not missing messages.)
So in this case the warning probably means that there are one or more duplicate messages somewhere as a result of new messages being posted during the dump. This note was from a time where I didn't even have incremental dumps and freshness checking and pretty much forgot about it since. I can probably introduce a buffer of received message IDs and check every new message against that to prevent these kinds of duplicates.
@lgrn Could this explain your case as well or can you rule out that this happens only when new messages are being posted during the dump?