pixie agent cause OS(Anolis OS 8.10)reboot issue
Describe the bug we are trying to deploy PIXIE on our test environment,but It reboot as we deploy the pixie pem and vizier. FYI, if only deployed the Pixie Cloud, no reboot issue
PIXIE: 0.19 k8s cluster: Client Version: v1.23.0 Server Version: v1.23.0
CPU: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 16
OS: NAME="Anolis OS" VERSION="8.10"
Additional context we have verified on Anolis OS 8.8, that was fine, so are there any compatibility issue for the pixie agent and Anolis OS 8.10? PIXIE: 0.19 k8s cluster: Client Version: v1.23.0 Server Version: v1.23.0
CPU: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 16
OS: NAME="Anolis OS" VERSION="8.8"
We aren't aware of any issues with Anolis OS, but this is the first I'm hearing of it. There also aren't any known issues that cause other OS's to reboot.
Can you provide the dmesg logs from a rebooted instance? In addition to that, if you can capture the logs of the pods in the pl namespace that would be helpful. Normally I'd suggest using px collect-logs, but that will probably be difficult given the instance is shutting down.
in the system log, I can see below errors after each reboot: Sep 14 06:35:07 localhost.localdomain kernel: pci 0000:00:18.7: bridge window [io 0x1000-0x0fff] to [bus 22] add_size 1000 Sep 14 06:35:07 localhost.localdomain kernel: pci 0000:00:15.0: BAR 15: assigned [mem 0xc0000000-0xc01fffff 64bit pref] Sep 14 06:35:07 localhost.localdomain kernel: pci 0000:00:16.0: BAR 15: assigned [mem 0xc0200000-0xc03fffff 64bit pref] Sep 14 06:35:07 localhost.localdomain kernel: pci 0000:00:0f.0: BAR 6: assigned [mem 0xc0400000-0xc0407fff pref] Sep 14 06:35:07 localhost.localdomain kernel: pci 0000:00:15.3: BAR 13: no space for [io size 0x1000] Sep 14 06:35:07 localhost.localdomain kernel: pci 0000:00:15.3: BAR 13: failed to assign [io size 0x1000] Sep 14 06:35:07 localhost.localdomain kernel: pci 0000:00:15.4: BAR 13: no space for [io size 0x1000] Sep 14 06:35:07 localhost.localdomain kernel: pci 0000:00:15.4: BAR 13: failed to assign [io size 0x1000] Sep 14 06:35:07 localhost.localdomain kernel: pci 0000:00:15.5: BAR 13: no space for [io size 0x1000] Sep 14 06:35:07 localhost.localdomain kernel: pci 0000:00:15.5: BAR 13: failed to assign [io size 0x1000] Sep 14 06:35:07 localhost.localdomain kernel: pci 0000:00:15.6: BAR 13: no space for [io size 0x1000] Sep 14 06:35:07 localhost.localdomain kernel: pci 0000:00:15.6: BAR 13: failed to assign [io size 0x1000] Sep 14 06:35:07 localhost.localdomain kernel: pci 0000:00:15.7: BAR 13: no space for [io size 0x1000] Sep 14 06:35:07 localhost.localdomain kernel: pci 0000:00:15.7: BAR 13: failed to assign [io size 0x1000] Sep 14 06:35:07 localhost.localdomain kernel: pci 0000:00:16.3: BAR 13: no space for [io size 0x1000] Sep 14 06:35:07 localhost.localdomain kernel: pci 0000:00:16.3: BAR 13: failed to assign [io size 0x1000] Sep 14 06:35:07 localhost.localdomain kernel: pci 0000:00:16.4: BAR 13: no space for [io size 0x1000] Sep 14 06:35:07 localhost.localdomain kernel: pci 0000:00:16.4: BAR 13: failed to assign [io size 0x1000] Sep 14 06:35:07 localhost.localdomain kernel: pci 0000:00:16.5: BAR 13: no space for [io size 0x1000] Sep 14 06:35:07 localhost.localdomain kernel: pci 0000:00:16.5: BAR 13: failed to assign [io size 0x1000]
I set the environment variable of: PL_STIRLING_SOURCES=kSocket, it should enable below connector only: case SourceConnectorGroup::kTracers: return { SocketTraceConnector::kName };
The OS still crashed during the night. So I suspect that socket_trace_connector.cc may be the root cause.
it is the vizier-pem log before the OS crash.