How can you show messages with same message-id using notmuch virtual mailboxes?
Hello everybody,
only by accident (followed by two hours of confused investigation) I have recently become aware that I have received some emails which differ in content, but have the same Message-Id header.
I use notmuch virtual mailboxes and according to notmuch(1) section DUPLICATE MESSAGE FILES, notmuch deduplicates such occurrences. Thus, some emails are hidden from view in neomutt, unless the correct search terms are used.
Is it possible to sensibly deal with such duplicate emails, maybe by configuring some of neomutt's notmuch-related configuration parameters?
I found out that notmuch-search can be invoked with the --output=files argument to include duplicates, but I don't know that can or should be put to use.
hi @mbwgh , I'm currently on a journey of removing some 200k+ duplicate emails. I feel your pain. I'm not a neomutt user, but I can perhaps answer the notmuch question.
notmuch tracks the message ids and so you can run a search notmuch search --output=files id:<msgid> (note: do not include the angle brackets < >) and it should show you a list of files that it thinks contain that message. The theory is that those messages should be identical, but that is not true in practice. In my case, I have found a single header (X-TUID) to be the only difference in those messages. I'm currently running some scripts to triple check that mine are truly all the same, but at this point, based on md5 checksums, it sure looks like I'll be able to free up around 220k of my 310k files.
notmuch seems to track the message ids and the files that go with it in what feels like an array. Every message id has at least one file associated, but all duplicates will also be in that array.
If you want to get a quick sense of how many dupes you might have, try running: notmuch search --output=files --duplicate=2 date:1970.. | wc -l that will give you a count of how many emails have at least 1 potential duplicate. Change the --duplicate= to 3,4, etc and you'll get 3rd, 4th duplicate. And --duplicate=1 is a "theoretical" original. In my case, I found some emails had 11 duplicates.
I think in my case it will be worth the effort.
And what is baffling is how blazingly fast notmuch still is.