copr icon indicating copy to clipboard operation
copr copied to clipboard

Not re-notifying FE and msg buses for the new host

Open praiskup opened this issue 2 years ago • 2 comments

There's a bug in the backend <-> fronend protocol -> if copr-rpmbuild is not found on the builder, the task is never switched to running on FE:

[2023-10-16 07:42:33,269][  INFO][PID:2650505] Marking build as starting
[2023-10-16 07:42:33,460][  INFO][PID:2650505] Trying to allocate VM
[2023-10-16 07:42:36,564][  INFO][PID:2650505] Allocated host ResallocHost, ticket_id=11456, hostname=2620:52:3:1:dead:beef:cafe:c141
[2023-10-16 07:42:36,565][  INFO][PID:2650505] Allocating ssh connection to builder
[2023-10-16 07:42:36,565][  INFO][PID:2650505] Checking that builder machine is OK
[2023-10-16 07:42:36,568][  INFO][PID:2650505] Popen command started: ssh -F /home/copr/.ssh/config mockbuilder@2620:52:3:1:dead:beef:cafe:c141 'rpm -q copr-rpmbuild --qf "%{VERSION}
"'
[2023-10-16 07:42:36,584][ ERROR][PID:2650505] Re-try request for task on 'ResallocHost, ticket_id=11456, hostname=2620:52:3:1:dead:beef:cafe:c141': The copr-rpmbuild package was not found: 
[2023-10-16 07:42:36,585][  INFO][PID:2650505] Releasing VM back to pool
[2023-10-16 07:42:36,615][  INFO][PID:2650505] Retry #1 (on other host)
[2023-10-16 07:42:36,617][  INFO][PID:2650505] Trying to allocate VM
[2023-10-16 07:42:42,849][  INFO][PID:2650505] Allocated host ResallocHost, ticket_id=11457, hostname=2620:52:3:1:dead:beef:cafe:c141
[2023-10-16 07:42:42,850][  INFO][PID:2650505] Allocating ssh connection to builder
[2023-10-16 07:42:42,851][  INFO][PID:2650505] Checking that builder machine is OK
[2023-10-16 07:42:42,853][  INFO][PID:2650505] Popen command started: ssh -F /home/copr/.ssh/config mockbuilder@2620:52:3:1:dead:beef:cafe:c141 'rpm -q copr-rpmbuild --qf "%{VERSION}
"'
[2023-10-16 07:42:43,137][  INFO][PID:2650505] Popen command started: ssh -F /home/copr/.ssh/config mockbuilder@2620:52:3:1:dead:beef:cafe:c141 '/usr/bin/test -f /etc/mock/fedora-rawhide-x86_64.cfg'
[2023-10-16 07:42:43,204][  INFO][PID:2650505] Filling build.info file with builder info
[2023-10-16 07:42:43,205][  INFO][PID:2650505] Not re-notifying FE and msg buses for the new host.
[2023-10-16 07:42:43,206][  INFO][PID:2650505] Starting remote build: copr-rpmbuild --verbose --drop-resultdir --task-url https://copr.stg.fedoraproject.org/backend/get-build-task/2915642-fedora-rawhide-x86_64 --chroot fedora-rawhide-x86_64 --detached
[2023-10-16 07:42:43,207][  INFO][PID:2650505] Popen command started: ssh -F /home/copr/.ssh/config mockbuilder@2620:52:3:1:dead:beef:cafe:c141 'copr-rpmbuild --verbose --drop-resultdir --task-url https://copr.stg.fedoraproject.org/backend/get-build-task/2915642-fedora-rawhide-x86_64 --chroot fedora-rawhide-x86_64 --detached'
[2023-10-16 07:42:43,509][  INFO][PID:2650505] Downloading the builder-live.log file, attempt 1
[2023-10-16 07:42:43,511][  INFO][PID:2650505] Popen command started: ssh -F /home/copr/.ssh/config mockbuilder@2620:52:3:1:dead:beef:cafe:c141 copr-rpmbuild-log

praiskup avatar Oct 16 '23 07:10 praiskup

  1. we take first machine
  2. the machine fails too early (the build doesn't start), so the ticket stays in starting
  3. no message to BUS is sent either (automation breakage)
  4. another machine is taken
  5. we don't send BUS message now (this is the bug!) so no automation can react

praiskup avatar Oct 19 '23 12:10 praiskup

This seems to rarely happen, therefore we give this "2years" category. Please report if this happens frequently.

praiskup avatar Oct 19 '23 12:10 praiskup