offlineimap icon indicating copy to clipboard operation
offlineimap copied to clipboard

offlineimap confused after (suspend and) resume

Open quite opened this issue 12 years ago • 35 comments

I have at (almost) all times offlineimap running in a tmux. After resuming my compu, offlineimap has problems to reconnect properly. Often it just hangs for a long time (or indefinetely) while nothing seems to happen.

I came up with a solution to the problem by having a resume-script that sends SIGUSR2 to my running offlineimap, whereafter it is automatically restarted.

The problem though, is that sometimes offlineimap takes a long time to exit upon SIGUSR2. Maybe this is again related to the lost tcp connections, timeouts, and such. And essentially the same problem as above.

Any suggestions on how I should solve? I have been reluctant to just send SIGKILL to the process, because I was afraid that this might cause inconsistencies in the repo, or such. But maybe this is what I need and should to do?

quite avatar Sep 30 '13 07:09 quite

Which version of OfflineIMAP you're using? Which OS?

konvpalto avatar Sep 30 '13 08:09 konvpalto

offlineimap 6.5.4 on an updated Arch Linux, which means basically latest vanilla versions of everything

quite avatar Sep 30 '13 11:09 quite

What typically happens after resume and upon sending offlineimap the SIGUSR2:

Terminating after this sync... [seems to hang forever, i press ^C] Terminating NOW (this may take a few seconds)... [hung again....]

quite avatar Oct 03 '13 11:10 quite

Just wanted to add that I can see the same whilst using 6.5.5 on an updated Arch Linux. Are there any thoughts as of why is this happening?

aignas avatar Feb 06 '14 14:02 aignas

Same here. Offlineimap 6.5.3 on OpenSUSE. I use timeout to fix that and kill offlineimap if timeout returns non zero. Never had issues with broken repo because of that (happily).

Gonzih avatar Feb 10 '14 20:02 Gonzih

How do you use timeout(1) for that? Are you running offlineimap -o (run-once mode)?

quite avatar Feb 11 '14 11:02 quite

No, just calling offlineimap by cron.

On Tue, Feb 11, 2014 at 03:55:07AM -0800, Daniel wrote:

How do you use timeout(1) for that? Are you running offlineimap -o (run-once mode)?


Reply to this email directly or view it on GitHub: https://github.com/OfflineIMAP/offlineimap/issues/56#issuecomment-34747606

Best regards, Max

Gonzih avatar Feb 11 '14 12:02 Gonzih

I'm seeing this issue too (6.5.5 on Arch Linux).

doy avatar Mar 12 '14 14:03 doy

Same here, 6.5.5 on Arch Linux. Not using cron, just running offlineimap -o.

rcorre avatar Mar 16 '14 19:03 rcorre

I am seeing a similar problem on a Macbook Air running Mac OS X 10.9.2 (Mavericks). No cron, just using offlineimap.

treese avatar Apr 01 '14 20:04 treese

Same here. Offlineimap 6.5.4, Python: 2.7.5, Debian Wheezy (Ubuntu)

christopherraa avatar Apr 22 '14 19:04 christopherraa

You should try to set socktimeout in general section of your offlineimaprc. It sets timeout on select call, so the process will terminate when no data is recieved within the timeout. Solved the problem for me.

mlen avatar Jun 08 '14 12:06 mlen

Thanks @mlen -- setting socktimeout in the [general] section seems to work.

rcorre avatar Jun 08 '14 14:06 rcorre

This workaround helps, but I'm still getting occasional hangs even with the socktimeout option set.

doy avatar Jun 10 '14 01:06 doy

@mlen's suggestion worked for me, thanks.

jbmartin avatar Jul 21 '14 04:07 jbmartin

what value of socktimeout did you use?

choucavalier avatar Sep 09 '14 09:09 choucavalier

I use socktimeout = 10

rcorre avatar Sep 09 '14 10:09 rcorre

@murphyslaw480 thanks :+1:

choucavalier avatar Sep 09 '14 10:09 choucavalier

Requires to be documented in known issues.

nicolas33 avatar Jan 12 '15 13:01 nicolas33

Done in cd962d4.

nicolas33 avatar Feb 13 '15 16:02 nicolas33

As I mentioned, setting socktimeout doesn't actually fix the problem - it makes it less frequent, but it still happens to me all the time even with this set. I don't think this is a sufficient fix.

doy avatar Feb 13 '15 16:02 doy

Ok. I know current behaviour sucks. Sadly, it's hard to handle this properly so don't expect this to be fixed soon.

nicolas33 avatar Feb 13 '15 16:02 nicolas33

There are two things I would suggest here:

  1. If the first C-c attempts a graceful exit, the second C-c should hard exit.
  2. I'm pretty sure the remaining hanging is from the timeout not being applied to all blocking calls. This is something an audit should be able to catch.

ezyang avatar Mar 31 '15 18:03 ezyang

Hi Edward,

  1. Yes, I've already suggested to handle the second Ctrl-c as hard exit.
  2. On resume the timeout might require to wait until the local time is adjusted. Or wait until the timeout is hit. BTW, the broken socket should be better handled. I'm planning a deep refactoring and such issues should be made easier to fix. You might be interested in following the coming changes.

nicolas33 avatar Mar 31 '15 21:03 nicolas33

Still happening on 6.6.1

dolohow avatar Feb 16 '16 22:02 dolohow

Fix to force OfflineIMAP to stop with consecutives ctrl+c was merged some days ago. Will be in the next release (6.7.0-rc2). AFAIK, nobody worked on proper resume at wakeup.

nicolas33 avatar Feb 17 '16 01:02 nicolas33

Thank you for the update. If someone could tell me what should be done, maybe I would try to iplement that feature.

dolohow avatar Feb 17 '16 07:02 dolohow

I'm also interested in someone fixing this. I have the same issue. :sweat_smile:

(I have offlineimap running as a systemd user service under Arch Linux)

choucavalier avatar Feb 17 '16 07:02 choucavalier

  • Naive but still effective approach would be to introduce print statements to find a blocker.
  • A more advanced way can be to try strace while this can be tricky to map the output to lines of code.
  • Python includes debugging tools that could be usefull. Most appealing for the purpose might worth a try.
  • Team working can greatly help. Do share your analysis, success and failures.

Bear in mind there might be more than one blocker.

nicolas33 avatar Feb 17 '16 13:02 nicolas33

Assuming this will be difficult to debug, can we at least implement a SIG{INT,TERM} handler which deletes the lockfile? I currently have to delete .offlineimap/*.lock files every time I resume from suspend because when offlineimap freezes it leaves the lockfiles hanging around.

pwnage101 avatar Jul 12 '16 01:07 pwnage101