beaker
beaker copied to clipboard
[BUG] kernel oops that happens during provison are marked as panic in the last task
Describe the bug
This was initially reported on https://bugzilla.redhat.com/show_bug.cgi?id=1623729
In case there is kernel oops during provision this oops is marked as a kernel panic in the last beaker task. This causes confusion to understand why that task is shown as panic.
Version-Release number
28.2
To Reproduce
Steps to reproduce the behavior:
- Provision an OS that triggers kernel oops at provision, but the provision completes successfully
- Run several tasks
- The last task will show as panic, because of the oops that happened in the provisioning.
Actual behavior
Last task show as panic
Expected behavior
Last task shouldn't show as panic
Panic detection can be configured by PANIC_REGEX
, it's set on LC basis https://github.com/beaker-project/beaker/blob/3a1155308899a301dbf042c55694680f03051346/LabController/src/bkr/labcontroller/default.conf#L23
and by default it includes Oops.
You can disable panic detection all together by including <watchdog panic="None"/>
in your recipe.
Note, I don't want to ignore Oops
in general. My only problem are with Oops
that happened on provision that causes the last task in the recipe to be marked as Panic
, this task shouldn't be marked as panic as there was no Oops
when the task executed.
@bgoncalv is right. I'm aware of this behavior. The problem is that we can't really mark Panic
during installation at this moment, therefore most of the time it lands on the first task. But, it may happen that we failed to detect it at first therefore it will land on another (the one which was running when we found Panic
).
Yeah. This needs to remain open and we will need to find a better way how to manage this.
Thanks @StykMartin, do you have any suggestion how can we workaround this? For example, we could create a dummy task to trigger this panic detection and run it as first task. Is there a way for a task force to run the detection?
Hello @bgoncalv. My answer never ended on GitHub. I'm sorry about that. So the situation is quite a bit more complicated, but I feel like we can make some compromise if you still need it.
So panic detection is running on background in each region. Then it is proxied to the main server to process. The main server will check tasks assigned to given recipe and iterate over. If iteration is exhausted we will mark last item - because that's what will end up in variable.
If you have any idea how we can help you @bgoncalv to make it saner feel free to shoot.
@StykMartin thanks for the reply. Indeed that feature would be helpful for us. If the panic detection will assign panic
detected during provision always in the last task of the recipe that would help us indeed.
This is happening at this moment. All panics are assigned to the last task if there is no running task. So basically if we didn't register any start (for example provisioning) we will always mark last. However, there is a catch. If restraint will report n-2 task as finished and n-1 is not reported as started then we may report panic in this window to last as well. But changes are quite small that restraint will crash in moments like this.
I would suggest you put dummy tasks at the end and then collect panics from there.
I will try to redesign this feature so we can report panics during provisioning to dedicated space and not marking tasks.