taskserver [TD-94] Taskd takes up all RAM

Tom Sydney Kerckhove on 2015-03-24T23:49:13Z says:

I can sync on one system, but when I try to sync with the other system, taskd starts consuming all available memory. Both systems use task 2.4.2.

Feb 12 '18 00:02 taskwarrior

Migrated metadata:

Created: 2015-03-24T23:49:13Z
Modified: 2015-05-02T12:42:33Z

Feb 12 '18 00:02 taskwarrior

Paul Beckingham on 2015-03-25T00:42:09Z says:

Thanks for the report. Can you tell us how long the taskd instance was running before this happened? How often you are syncing? Are you using recurring tasks? Just looking for clues here.

Is there anything in the taskd.log file?

Feb 12 '18 00:02 taskwarrior

Tom Sydney Kerckhove on 2015-03-25T08:15:47Z says:

The taskd instance was running constantly for several days. I'm syncing every minute (crontab), but I guess it doesn't do anything when there's nothing to sync. I am using recurring tasks, though only two. Not sure what to look for in the (35MB !) log file.

I had to increase request size limit to sync my tasks once, I think the problem started then.

Is there anyway I can work around this for now? (like start a new user on the server? It's mine anyway.)

Feb 12 '18 00:02 taskwarrior

Paul Beckingham on 2015-03-25T11:28:22Z says:

Can you show your taskd installation details (XXXX out the hostname for privacy):

$ taskd diag

This will show me the various version numbers. I am looking for patterns, and very old GnuTLS libs.

Can you show me the size of the data:

$ wc ~/.task/*data

This will show me whether you have lots of tasks, or a few.

Then the number of transactions:

$ grep -c ' from ' taskd.log

Then the bounce count, which is how many times you restarted. Combining transactions and bounce count will give me a sense of how long your server stays up:

$ grep -c Daemonizing taskd.log

Then the size of the transactions:

$ grep Stored taskd.log | grep -v 'Stored 0 '

This will tell me if you have unusually large transactions (which I doubt, given the sync frequency). If you are not comfortable publishing this data here, please send it to support(at)taskwarrior.org.

We don't have a workaround for this yet (this is the second case of this problem), but here is something to try:

Bounce the server (no need to create a new user)
Delete the tx.data file on the server, and from one client run task sync init to give the server a copy, then continue as before
Sync twice a minute? That's a lot. Reducing that will help, as this appears to be a cumulative memory leak, so any reduction in the syncs will help, while we find and fix this. So far we have not seen this behavior.
Failing all that, bounce the server every night.

Meanwhile, we'll keep improving the server, and try to find this problem.

Feb 12 '18 00:02 taskwarrior

Tom Sydney Kerckhove on 2015-03-25T16:36:18Z says:

taskd diag

    taskd 1.0.0
        Platform: Linux
        Hostname: XXXX

    Compiler
         Version: 4.8.2 20140120 (Red Hat 4.8.2-16)
            Caps: +stdc +stdc_hosted +200809 +LP64 +c1 +i4 +l8 +vp8
    
    Build Features
           Built: Jan 28 2015 19:15:58
          Commit: 2f40c1b
           CMake: 2.8.12
            Caps: +pthreads +tls
         libuuid: libuuid + uuid_unparse_lower
       libgnutls: 2.8.5

$ wc ~/.task/*data (on the system that makes taskd consume all RAM)

    141     807   34350 .task/backlog.data
   1472   34172  449841 .task/completed.data
    398    5000  105789 .task/pending.data
  24474  272705 3462316 .task/undo.data
  26485  312684 4052296 total

$ wc ~/.task/*data (on the other system)

      1       1      37 /home/syd/.task/backlog.data
   1539   36627  484465 /home/syd/.task/completed.data
    674    7970  160428 /home/syd/.task/pending.data
  29043  307801 4043979 /home/syd/.task/undo.data
  31257  352399 4688909 total

$ grep -c ' from ' taskd.log

grep -c Daemonizing taskd.log

grep Stored taskd.log | grep -v 'Stored 0 ' (this is huge, only the last few lines)

2015-03-19 16:36:01 [57767] Stored 2 tasks, merged 0 tasks
2015-03-19 17:35:56 [57886] Stored 1 tasks, merged 0 tasks
2015-03-20 10:28:57 [59005] Stored 1143 tasks, merged 0 tasks
2015-03-20 11:02:53 [59041] Stored 651 tasks, merged 0 tasks
2015-03-20 15:30:03 [1] Stored 6127 tasks, merged 0 tasks
2015-03-23 16:28:57 [1] Stored 184 tasks, merged 0 tasks
2015-03-24 17:38:05 [1] Stored 1 tasks, merged 0 tasks

I deleted the tx.data file and resynced, but now one system gets the following answer:

Sync failed.  The Taskserver returned error: 500 Client sync key not found.

Feb 12 '18 00:02 taskwarrior

Renato Alves on 2015-04-17T20:07:36Z says:

Given that request size limit had to be increased, I suspect it's the same problem described in TD-77.

How much is "All available RAM" on your system?

Feb 12 '18 00:02 taskwarrior

Tom Sydney Kerckhove on 2015-04-17T20:23:26Z says:

All RAM is about 1GIB

Feb 12 '18 00:02 taskwarrior

Paul Beckingham on 2015-04-25T22:13:33Z says:

Observations:

Your taskserver handled 59041 transactions. If there is a memory leak, that would certainly be enough transactions to make it show itself in this way.
libgnutls 2.8.5 is from 2009, and as a security product, this is horribly out of date. Is updating your gnutls, rebuilding, and rerunning an option?

Feb 12 '18 00:02 taskwarrior

Tom Sydney Kerckhove on 2015-04-28T21:53:37Z says:

a more recent version of libgnutils doesn't seemt to be available for amazon linux in the package repositories.

The problem hasn't presented itself recently anymore. It happened after I had modified a lot of tasks with a command that had a really open filter. I even had to change the server transation limit to have it start syncing.

I will compile gnutils manually some time soon. Is there anything I should keep you updated on?

Feb 12 '18 00:02 taskwarrior

Paul Beckingham on 2015-04-28T23:59:51Z says:

Thanks Tom, I appreciate the feedback. I added you as a contributor. I think the problem will go away with newer GnuTLS versions, and I'll just assume that is the case, which means I'll be very interested if it happens again, and if it does, I'm interested in the data we gathered above.

It makes sense that you had to increase the transaction size for that large update, it's really just a crude cutoff that prevents someone trying a 50Gb sync and taking out your server. Raising it (and keeping it) above the default 1Mb is probably a good idea. Perhaps I should raise the default.

We have gathered some data about whether we have seen bad leaks with various GnuTLS versions:

GOOD 3.3.14 BAD 2.12.20 BAD 2.8.5

Looking at the commits to the GnuTLS project, I see several leak fixes in the 3.2 branch, which corroborates this data. If you see data that extends the information in this list, I'd be interested.

I'll keep the issue open for now - perhaps someone will see it and add to the data, or learn from it.

Feb 12 '18 00:02 taskwarrior