datagrepper icon indicating copy to clipboard operation
datagrepper copied to clipboard

Datagrepper occasionaly misses (rather small) time spans

Open alxpl opened this issue 8 years ago • 15 comments

While updating a few related packages, I lost track of which changes hadn't been pushed to git yet, so I turned to datagrepper. At some point, there was no activity recorded for at least 12 minutes, even though I had pushed to 3 repos in the mean time. It returned to normal after a while, but it had completely missed this commit: http://pkgs.fedoraproject.org/cgit/rpms/gdouros-asea-fonts.git/commit/?id=08bb207643ba73a5a36d7d9c5ad379cf02769633

I remembered that I had encountered a similar issue in the past: https://github.com/fedora-infra/tahrir/issues/318

alxpl avatar Sep 11 '16 15:09 alxpl

I found another instance. In July, I had a build fail in koji (because I was absent-minded): https://koji.fedoraproject.org/koji/taskinfo?taskID=14808302

Datagrepper has no record of this: https://apps.fedoraproject.org/datagrepper/raw?user=alexpl&delta=86400&end=1467936000

alxpl avatar Feb 01 '17 09:02 alxpl

One thing that could have happen is that we had an outage in that period, something like a maintenance reboot or something like this.

pypingou avatar Feb 01 '17 09:02 pypingou

I usually check the planned outages announcements before doing anything, so it couldn't have been that, unless we're talking about an unscheduled hiccup. In that particular instance, the failed build started at 14:58:44 and it completed at 15:00:47. Datagrepper lists several other koji tasks running at about the same time: https://apps.fedoraproject.org/datagrepper/raw?delta=400&end=1467903800 Was it just my luck?

alxpl avatar Feb 01 '17 10:02 alxpl

Yeah, I don't really know then, it was just an idea I had suddenly

pypingou avatar Feb 01 '17 10:02 pypingou

Are there any logs from that far back? Should I ping Kevin Fenzi?

alxpl avatar Feb 01 '17 19:02 alxpl

The reason it missed that build is that koji only reports the actual build. If the src.rpm creation fails, there's no messages emitted because the build never really 'started' as far as fedmsg is concerned.

See for example this random build: https://koji.fedoraproject.org/koji/buildinfo?buildID=837811

fedmsg says: "[13:33:04] buildsys.build.state.change -- besser82's swig-3.0.12-3.fc26 started building http://koji.fedoraproject.org/koji/buildinfo?buildID=837811"

But if you look at the buildsrpm parent: https://koji.fedoraproject.org/koji/taskinfo?taskID=17545095

Created Wed, 01 Feb 2017 20:31:43 UTC Started Wed, 01 Feb 2017 20:31:43 UTC Completed Wed, 01 Feb 2017 20:32:43 UTC

So, the fedmsg 'starting build' only happens after the src.rpm is built.

Perhaps we should add in a 'buildSRPMfromSCM started/finished' messages, but currently they do not exist.

nirik avatar Feb 01 '17 20:02 nirik

Thanks for the explanation Kevin, are there any clues for the other instances of missing activity?

alxpl avatar Feb 01 '17 20:02 alxpl

The gravitational pull of my bad luck has caused the activity of others to disappear as well. While I was on a friend's computer, I tried to use datagrepper to locate a devel mailing list thread that I had posted to yesterday and earlier today, but the last mailing list activity reported was 11 days ago: https://apps.fedoraproject.org/datagrepper/raw?user=alexpl&topic=org.fedoraproject.prod.mailman.receive

I went to hyperkitty to check for the thread and I did find it: https://lists.fedoraproject.org/archives/list/[email protected]/thread/PLWXN66B2KELTLGVTVURJO34YJIPVUW2/ however it's missing 3 posts, two by me and one by Richard W.M. Jones, who replied to my first post.

I then compared what is on hyperkitty for the devel list and the e-mails in my gmail from the same list and it seems that quite a few are gone. Which component is at fault here?

alxpl avatar Feb 04 '17 01:02 alxpl

@abompard can you heck hyperkitty? is this perhaps just because it's behind due to the recent outage/upgrade?

nirik avatar Feb 04 '17 02:02 nirik

check I meant of course. ;(

nirik avatar Feb 04 '17 02:02 nirik

Yeah there was a performance regression after the HyperKitty upgrade last week, I fixed it on Monday. It should be back to normal.

abompard avatar Feb 08 '17 16:02 abompard

It's happened again, I built a package in copr for F25, F26 & F27, both i386 & x86_64 and I never got an email notification for 26-x86_64, even though it was the second or third build to finish. It's not listed in datagrepper either: https://apps.fedoraproject.org/datagrepper/raw?topic=org.fedoraproject.prod.copr.build.end&user=alexpl

Should I just stop minding these things (and stop bugging people about them as well)?

alxpl avatar Jun 14 '17 00:06 alxpl

Hey @alxpl,

I responded to your fedmsg issue, but the short version here is that it's expected that messages will occasionally get lost. That's not to say messages can't get lost due to a bug, but they can and do get lost due to the very nature of the protocol we're using.

jeremycline avatar Jun 14 '17 12:06 jeremycline

OK, thanks. It felt like my "civic duty" to report this, no damage done.

alxpl avatar Jun 14 '17 12:06 alxpl

Yeah, I think I can speak for everyone when I say we really appreciate bug reports. Thanks!

jeremycline avatar Jun 14 '17 19:06 jeremycline