IMAPdedup icon indicating copy to clipboard operation
IMAPdedup copied to clipboard

Fails on large message count

Open vortek opened this issue 7 years ago • 8 comments

It worked for all the folders. Then I did the dry-run for the INBOX folder and it found 113000 duplicates. When i remove the -n option it fails. If I try the dry-run again now it also fails.

$ ./imapdedup.py -s mail.server.com -u [email protected] -x l
Password:
Spam
Drafts
Deleted Items
Sent
INBOX
$ ./imapdedup.py -s mail.server.com -u [email protected] -x INBOX
Password: 
There are 170714 messages in INBOX.
No message(s) currently marked as deleted in INBOX
170714 others in INBOX
Traceback (most recent call last):
  File "./imapdedup.py", line 324, in <module>
    main(sys.argv[1:])
  File "./imapdedup.py", line 321, in main
    process(options, mboxes)
  File "./imapdedup.py", line 248, in process
    ms = check_response(server.fetch(message_ids, '(RFC822.HEADER)'))
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/imaplib.py", line 456, in fetch
    typ, dat = self._simple_command(name, message_set, message_parts)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/imaplib.py", line 1088, in _simple_command
    return self._command_complete(name, self._command(name, *args))
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/imaplib.py", line 912, in _command_complete
    raise self.abort('command: %s => %s' % (name, val))
imaplib.abort: command: FETCH => socket error: EOF

vortek avatar Jan 29 '18 21:01 vortek

Hi João,

It looks as if you may have hit some limit on your server, or maybe it's timing out. I'd need to look more carefully at this and I'm afraid I'm not likely to manage that in the near future.

If you need a temporary fix, can I suggest splitting your inbox into folders, e.g. by year, running the program against each folder, and then (if you really want an inbox that large!) recombining them again?

Best, Quentin

quentinsf avatar Jan 29 '18 22:01 quentinsf

Hello Quentin, How do you suggest that I split the inbox? Thanks!

vortek avatar Jan 29 '18 22:01 vortek

Well, there are ways you could script it, but I would just use an email program to create a new folder, select all the messages in one year, and move them over. Then do the next year...

Depending on your email client, you may be able to do something clever with smart mailboxes to make the selection process easier...

quentinsf avatar Jan 29 '18 23:01 quentinsf

Thanks for the tips!

vortek avatar Jan 29 '18 23:01 vortek

I ran into this as well doing an inbox with 300K+ messages. (Don't ask..) First run was great it deleted 100K dupes & I was excited but there were still dupes showing up in roundcube so I figured I'd run it again but I'd get that EOF error on the same fetch headers line. I changed (RFC822.HEADER) to (BODY.PEEK[HEADER]) and it worked again for 1 run. Then the dreaded EOF error every run after. So I edited (BODY.PEEK[HEADER]) back to (RFC822.HEADER) and it worked.. For one run.. Until I let it sit awhile & it worked again.. For 1 run then EOF. By that time it was clear something funky was up so I decided to dig deeper to try & narrow it down. While I did many things including adjust MAXLINE and wrap the IMAP commands in try/except hoping it'd continue (it doesn't) it wasn't until I enabled debugging with imaplib.Debug & m.debug = True I finally got a big clue as to what was going on:

35:55.56 BYE response: Server shutting down.

So yeah umm seems the remote server is shutting down mid session? That'd explain why it works after editing (time passed allowing the server to be online again) And note it happened on folder with only 39 messages.. I had changed to another folder with fewer messages to try & narrow down the issue. I thought it was a fluke but was able to reproduce this shutting down bit multiple times.

Many guesses as to what is up from corrupt messages on server to overloading server to bug in python imaplib to who knows but clearly there's an issue, just can't say it's in IMAPdedup (in fact it's not in that I get similar issues with other programs/scripts) beyond maybe it'd be helpful if it better handled & recovered.

Btw not sure about OP but in my case this is all on InMotion shared business hosting which is Dovecot:

  • OK [CAPABILITY IMAP4rev1 LITERAL+ SASL-IR LOGIN-REFERRALS ID ENABLE IDLE NAMESPACE STARTTLS AUTH=PLAIN AUTH=LOGIN] Dovecot ready.

EDIT: Ok seems maybe that's syslog rate limiting in that post so maybe unrelated & weird coincidence.. Little searching & maybe it's rate limiting: "server dovecot: imap([email protected]): Server shutting down. in=7140 out=70598" https://www.howtoforge.com/community/threads/server-dovecot-imap-account-tld-com-server-shutting-down-in-7140-out-70598.74887/

If that's the case maybe need option to limit max # of messages it does at a time and/or add sleeps in the loop to help?

Bill48105 avatar Mar 04 '18 23:03 Bill48105

1150749 others in INBOX 30:37.28 > LCJD5 FETCH 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100 (RFC822.HEADER) 30:38.69 last 0 IMAP4 interactions: 30:38.69 > LCJD6 LOGOUT 30:38.69 last 0 IMAP4 interactions: Traceback (most recent call last): File "imapdedup.py", line 324, in main(sys.argv[1:]) File "imapdedup.py", line 321, in main process(options, mboxes) File "imapdedup.py", line 248, in process ms = check_response(server.fetch(message_ids, '(RFC822.HEADER)')) File "/root/daily_build/64_23/4.3.4/SysUtil/Python-2.7.5-cross/install_path_full/lib/python2.7/imaplib.py", line 443, in fetch File "/root/daily_build/64_23/4.3.4/SysUtil/Python-2.7.5-cross/install_path_full/lib/python2.7/imaplib.py", line 1070, in _simple_command File "/root/daily_build/64_23/4.3.4/SysUtil/Python-2.7.5-cross/install_path_full/lib/python2.7/imaplib.py", line 899, in _command_complete imaplib.abort: command: FETCH => socket error: EOF

I turned the imaplib debug on. I get that INBOX has huge amount of mails but fetching result in socket error EOF. Anyone has any insights?

shubhammatta avatar Jun 30 '18 19:06 shubhammatta

Mmm. Do you have access to the server logs?

The imaplib source says that '"abort" exceptions imply the connection should be reset, and the command re-tried.'

So perhaps that's what we should do (if anyone who can test this would like to submit a pull request!)

I guess your mail server may be very heavily loaded and timing out trying to do this even for 100 messages. However, you may be asking for problems with any IMAP server if you keep more than a million messages in a single mailbox! Not to mention using a lot of RAM on your local machine if you do manage to download even their headers...

quentinsf avatar Jun 30 '18 19:06 quentinsf

Thanks for the info. I reduced the chunksize to 1 and script ran. although if it again aborts, I will try to add the re connect part in the script. Will comment if that works. Although I wish it does not abort . Have been at it for quite some time now.

shubhammatta avatar Jun 30 '18 20:06 shubhammatta