fedmsg icon indicating copy to clipboard operation
fedmsg copied to clipboard

fedmsg-hub pulls a huge amount of backlog from datagrepper when simply restarted

Open AdamWill opened this issue 9 years ago • 3 comments

When I restart fedmsg-hub on the openQA server boxes, it seems to pull a huge amount of data from datagrepper:

Apr 15 23:50:19 openqa-stg01.qa.fedoraproject.org fedmsg-hub[23975]: [2016-04-15 23:50:19][    fedmsg    INFO] Retrieving datagrepper page 350 of 700
Apr 15 23:50:19 openqa-stg01.qa.fedoraproject.org fedmsg-hub[23975]: [2016-04-15 23:50:19][    fedmsg    INFO] Retrieving datagrepper page 355 of 583
Apr 15 23:50:19 openqa-stg01.qa.fedoraproject.org fedmsg-hub[23975]: [2016-04-15 23:50:19][    fedmsg    INFO] Retrieving datagrepper page 351 of 700
Apr 15 23:50:20 openqa-stg01.qa.fedoraproject.org fedmsg-hub[23975]: [2016-04-15 23:50:20][    fedmsg    INFO] Retrieving datagrepper page 356 of 583
Apr 15 23:50:20 openqa-stg01.qa.fedoraproject.org fedmsg-hub[23975]: [2016-04-15 23:50:20][    fedmsg    INFO] Retrieving datagrepper page 352 of 700
Apr 15 23:50:20 openqa-stg01.qa.fedoraproject.org fedmsg-hub[23975]: [2016-04-15 23:50:20][    fedmsg    INFO] Retrieving datagrepper page 357 of 583
Apr 15 23:50:20 openqa-stg01.qa.fedoraproject.org fedmsg-hub[23975]: [2016-04-15 23:50:20][    fedmsg    INFO] Retrieving datagrepper page 353 of 700

that's just an extract, it really does apparently do two retrievals of 1283 pages of data (or whatever, the numbers differ each time). This is just after doing a simple 'systemctl restart fedmsg-hub'. I assume this is for the backlog processing stuff, but it seems to be going wrong.

AdamWill avatar Apr 15 '16 23:04 AdamWill

Note at the end it says:

Apr 15 23:51:33 openqa-stg01.qa.fedoraproject.org fedmsg-hub[23975]: [2016-04-15 23:51:33][    fedmsg    INFO] Retrieved 0 messages from datagrepper.

AdamWill avatar Apr 15 '16 23:04 AdamWill

Can you tell us anything about the modified time, permissions, and contents of files in /var/run/fedmsg/status/fedmsg-hub/? That's where it leaves itself a note for after restart.

ralphbean avatar Apr 18 '16 13:04 ralphbean

ls --l /var/[adamwill@openqa01 ~][PROD]$ ls -l /var/run/fedmsg/status/fedmsg-hub/
total 12
-rw-r--r--. 1 fedmsg fedmsg 2850 Apr 18 11:25 CheckComposeProductionConsumer
-rw-r--r--. 1 fedmsg fedmsg 2683 Apr 18 13:20 OpenQAConsumer
-rw-r--r--. 1 fedmsg fedmsg 2850 Apr 18 11:25 OpenQAProductionWikiConsumer
[adamwill@openqa01 ~][PROD]$ date -u
Mon Apr 18 14:17:48 UTC 2016

so those have all been touched quite recently (i.e. they're not just not getting updated, I guess). Each one contains a single fedmsg message that corresponds to the timestamp of the file.

Hmm. All three consumers trigger on fairly infrequent messages: OpenQAConsumer triggers on compose completion, the other two trigger on openQA job completion (which of course is closely tied to compose completion; after each tested compose there's a flood of openQA job completion messages for an hour or so, then nothing till the next compose). Is it just that when the service is restarted some time since the previous compose, it replays every message since the last openQA job complete message? Perhaps this function should log the last received message somehow, not just the last consumed message, if that's the case?

AdamWill avatar Apr 18 '16 14:04 AdamWill