taskserver icon indicating copy to clipboard operation
taskserver copied to clipboard

[TD-71] Slow Clients make Others Wait

Open taskwarrior opened this issue 6 years ago • 11 comments

Jack on 2014-08-31T22:47:33Z says:

Here are two syncs from the FreeCinc log. Both accounts actually belong to me. The first, which I will call "A" has a 41kB tx.data file (normal usage) and syncs every 7 minutes from Mirakel. "A" took a whopping 41 seconds to service. At the time of this log, my cell phone was in a spotty coverage area. Could that explain the 41 seconds?

The second account, which I will call "B", is a test account in the amazon cloud. Every minute it adds a task with a short, randomized description and syncs. It usually hits the taskd log at 12 seconds past the minute due to differences in each server's respective clock. But in this case it actually timed out (I give it a five second window to sync, then pull the plug).

Here's what I suspect is happening. (Fill me in if I'm wrong about how taskd handles multitasking/threading)

  • "A" calls up taskd and begins to sync. Being in spotty cell reception at the time, it takes a while
  • Meanwhile, "B" calls up taskd and gets in the queue to sync
  • Once "A" is done, "B" gets a chance to sync.
  • "B" has fast internet and syncs quickly as soon as "A" is done.

A couple concerns:

  1. If slow internet connections actually make other clients wait (I don't know if this is how it works--please fill me in), this could limit how many clients a single server can handle.
  2. I'm not sure if this is by design or oversight, but "A" has all matching timestamps. In a way this is good because it shows everything together. But on the other hand it doesn't tell me whether 16:52:17 was the start of the sync or the end. (I'm going to assume the end of the sync, since "B" also at 16:52:17)

{code} 2014-08-31 16:52:17 DEBUG client 'Mirakel 2.7.1' allowed by 'all' 2014-08-31 16:52:17 [1694] 'sync' from 'FreeCinc/freecinc_A******' using 'Mirakel 2.7.1' at :0 2014-08-31 16:52:17 [1694] Client key '[filtered]' 0 2014-08-31 16:52:17 [1694] Loaded 449 records 2014-08-31 16:52:17 [1694] Branch point: [filtered] --> 448 2014-08-31 16:52:17 [1694] Subset 0 tasks 2014-08-31 16:52:17 [1694] Stored 0 tasks, merged 0 tasks 2014-08-31 16:52:17 [1694] Sync key '[filtered]' still valid 2014-08-31 16:52:17 [1694] No change 2014-08-31 16:52:17 [1694] Serviced in 41.464794s 2014-08-31 16:52:17 DEBUG client 'task 2.3.0' allowed by 'all' 2014-08-31 16:52:17 [1695] 'sync' from 'FreeCinc/freecinc_B******' using 'task 2.3.0' at :0 2014-08-31 16:52:17 [1695] Client key '[filtered]' 1 2014-08-31 16:52:17 [1695] Loaded 26444 records 2014-08-31 16:52:17 [1695] Branch point: [filtered] --> 26443 2014-08-31 16:52:17 [1695] Subset 0 tasks 2014-08-31 16:52:17 [1695] Stored 1 tasks, merged 0 tasks 2014-08-31 16:52:17 [1695] New sync key '[filtered]' 2014-08-31 16:52:17 [1695] Wrote 2 2014-08-31 16:52:17 [1695] Serviced in 0.072355s {code}

taskwarrior avatar Feb 11 '18 23:02 taskwarrior

Migrated metadata:

Created: 2014-08-31T22:47:33Z
Modified: 2015-02-07T19:59:14Z

taskwarrior avatar Feb 11 '18 23:02 taskwarrior

Paul Beckingham on 2014-08-31T23:19:14Z says:

Great data, well presented, thank you Jack.

Essentially your analysis is spot-on. Taskd is a single-threaded server, and one client will block another. We have plans to address this, but have never seen it have such an enormous impact as this. The plans are:

  1. Implement user-level locking on the data. This would allow launching two or more daemons to manage one $TASKDDATA instance. Users would have daemon-instance affinity (they are configured per port), which means the problem can still occur, but not for everyone.

  2. Upgrade the server to be a threaded server, with a configurable thread-pool, ensuring smoother data flow. This then gives you more tuning options. [2] requires [1].

Once these are implemented, the problem you describe is mitigated, in that the servers and other users are not queued up waiting for a slow client, thread-pool size notwithstanding.

Your 41s sync I assume is an extreme case, but with mobile devices, is it really exceptional? There is a lot of TLS handshaking going on, but once past that, taskd operates with a single upload transaction, then a single response transaction. I am surprised by the 2014-08-31 16:52:17 timestamp for every entry, and I expect that 41s to be reflected in the spread of timestamps reported. That's either a bug, or behavior I'm unaware of.

taskwarrior avatar Feb 11 '18 23:02 taskwarrior

Jack on 2014-09-01T03:32:37Z says:

I have requested a feature from Mirakel to randomize the sync time: https://mirakel.atlassian.net/browse/MIR-514

taskwarrior avatar Feb 11 '18 23:02 taskwarrior

Jack on 2014-09-01T03:39:44Z says:

If taskd could be made to run over http (not saying this is a trivial architectural change) then taskd could be mounted behind nginx. Nginx and Unicorn have a similar dance, where unicorn is a great server but can't handle slow clients. But put nginx in front of it, and nginx handles all the latency that the slow client introduces.

Just throwing that out there.

taskwarrior avatar Feb 11 '18 23:02 taskwarrior

Jack on 2014-09-01T05:02:24Z says:

A little data pulled from the FreeCinc logs:

Of 18,000 syncs to date, mean sync time is 0.4 seconds longest sync time is 160 seconds shortest sync time is 0.005 seconds

In terms of long sync times, there were: 10 syncs longer than 60 seconds 50 syncs longer than 15 seconds 100 syncs longer than 7 seconds 200 syncs longer than 5 seconds 500 syncs longer than 1 second 1000 syncs longer than 0.5 seconds

taskwarrior avatar Feb 11 '18 23:02 taskwarrior

Paul Beckingham on 2014-09-01T13:00:34Z says:

Does this exclude the client with the 36MB data?

taskwarrior avatar Feb 11 '18 23:02 taskwarrior

Jack on 2014-09-01T16:28:01Z says:

The client with the 36MB data typically synced in 0.35 seconds if no new tasks were added. The times when this client added a task and thereby crashed the server were not included in the above totals.

However, it is interesting to look at the logs of what happened when the crash occurred. This log snippet is representative. Note the three minute window between the sync that never completes and the restart of taskd. (I have a script running that restarts taskd 10 seconds after it crashes, so I assume the never-completing sync held on for nearly three minutes before crashing.)

Another interesting point is that the client sometimes syncs with Mirakel 2.6.2 and sometimes with 2.7.beta1.

{code} 2014-08-30 00:56:39 [286] 'sync' from 'FreeCinc/freecinc_******' using 'Mirakel 2.6.2' at :0 2014-08-30 00:56:39 [286] Client key '[filtered]' 9 2014-08-30 00:56:39 [286] Loaded 2332 records 2014-08-30 00:56:39 [286] Branch point: [filtered] --> 1555 2014-08-30 00:59:39 ==== taskd 1.0.0 ==== 2014-08-30 00:59:39 Serving from /home/dev/taskddata 2014-08-30 00:59:39 Debug mode 2014-08-30 00:59:39 Using port 53589 2014-08-30 00:59:39 Queue size 10 requests 2014-08-30 00:59:39 Request size limit 1048576 bytes 2014-08-30 00:59:39 IP logging off 2014-08-30 00:59:39 client.allow /all/ 2014-08-30 00:59:39 CA /home/dev/taskddata/ca.cert.pem 2014-08-30 00:59:39 Certificate /home/dev/taskddata/server.cert.pem 2014-08-30 00:59:39 Private Key /home/dev/taskddata/server.key.pem 2014-08-30 00:59:39 CRL /home/dev/taskddata/server.crl.pem 2014-08-30 00:59:39 Server starting {code}

taskwarrior avatar Feb 11 '18 23:02 taskwarrior

Jack on 2014-10-09T18:24:06Z says:

Paul, what do you think of including a timeout as part of .taskdrc such that the server is configurable what length of syncs it is willing to tolerate, and if any clients take longer than that it just drops them? This might be a quick win in terms of keeping slow clients from clogging a server.

taskwarrior avatar Feb 11 '18 23:02 taskwarrior

Paul Beckingham on 2014-10-09T21:36:50Z says:

Probably a good idea, but wouldn't stop me from threading the server first.

taskwarrior avatar Feb 11 '18 23:02 taskwarrior

Renato Alves on 2014-10-13T15:18:58Z says:

Threading sounds good to me, once locks are in place to prevent concurrent syncs by same user.

Breaking a sync into shorter sync operations is an interesting idea though it may require some changes in the "backlog.data" format. Some changes need to be synced together for consistency. Recurring tasks pop to mind: (parent and child).

taskwarrior avatar Feb 11 '18 23:02 taskwarrior

Instead of implementing yet-another-threading-server, couldn't the protocol be piggybacked on HTTP somehow, such that nginx can be used for the actual networking?

madduck avatar Apr 03 '18 00:04 madduck