fedmsg-hub pulls a huge amount of backlog from datagrepper when simply restarted
When I restart fedmsg-hub on the openQA server boxes, it seems to pull a huge amount of data from datagrepper:
Apr 15 23:50:19 openqa-stg01.qa.fedoraproject.org fedmsg-hub[23975]: [2016-04-15 23:50:19][ fedmsg INFO] Retrieving datagrepper page 350 of 700
Apr 15 23:50:19 openqa-stg01.qa.fedoraproject.org fedmsg-hub[23975]: [2016-04-15 23:50:19][ fedmsg INFO] Retrieving datagrepper page 355 of 583
Apr 15 23:50:19 openqa-stg01.qa.fedoraproject.org fedmsg-hub[23975]: [2016-04-15 23:50:19][ fedmsg INFO] Retrieving datagrepper page 351 of 700
Apr 15 23:50:20 openqa-stg01.qa.fedoraproject.org fedmsg-hub[23975]: [2016-04-15 23:50:20][ fedmsg INFO] Retrieving datagrepper page 356 of 583
Apr 15 23:50:20 openqa-stg01.qa.fedoraproject.org fedmsg-hub[23975]: [2016-04-15 23:50:20][ fedmsg INFO] Retrieving datagrepper page 352 of 700
Apr 15 23:50:20 openqa-stg01.qa.fedoraproject.org fedmsg-hub[23975]: [2016-04-15 23:50:20][ fedmsg INFO] Retrieving datagrepper page 357 of 583
Apr 15 23:50:20 openqa-stg01.qa.fedoraproject.org fedmsg-hub[23975]: [2016-04-15 23:50:20][ fedmsg INFO] Retrieving datagrepper page 353 of 700
that's just an extract, it really does apparently do two retrievals of 1283 pages of data (or whatever, the numbers differ each time). This is just after doing a simple 'systemctl restart fedmsg-hub'. I assume this is for the backlog processing stuff, but it seems to be going wrong.
Note at the end it says:
Apr 15 23:51:33 openqa-stg01.qa.fedoraproject.org fedmsg-hub[23975]: [2016-04-15 23:51:33][ fedmsg INFO] Retrieved 0 messages from datagrepper.
Can you tell us anything about the modified time, permissions, and contents of files in /var/run/fedmsg/status/fedmsg-hub/? That's where it leaves itself a note for after restart.
ls --l /var/[adamwill@openqa01 ~][PROD]$ ls -l /var/run/fedmsg/status/fedmsg-hub/
total 12
-rw-r--r--. 1 fedmsg fedmsg 2850 Apr 18 11:25 CheckComposeProductionConsumer
-rw-r--r--. 1 fedmsg fedmsg 2683 Apr 18 13:20 OpenQAConsumer
-rw-r--r--. 1 fedmsg fedmsg 2850 Apr 18 11:25 OpenQAProductionWikiConsumer
[adamwill@openqa01 ~][PROD]$ date -u
Mon Apr 18 14:17:48 UTC 2016
so those have all been touched quite recently (i.e. they're not just not getting updated, I guess). Each one contains a single fedmsg message that corresponds to the timestamp of the file.
Hmm. All three consumers trigger on fairly infrequent messages: OpenQAConsumer triggers on compose completion, the other two trigger on openQA job completion (which of course is closely tied to compose completion; after each tested compose there's a flood of openQA job completion messages for an hour or so, then nothing till the next compose). Is it just that when the service is restarted some time since the previous compose, it replays every message since the last openQA job complete message? Perhaps this function should log the last received message somehow, not just the last consumed message, if that's the case?