pixie
pixie copied to clipboard
vizier-pem error
I0112 05:55:31.567047 2147280 stirling.cc:394] Adding info class: [process_stats/process_stats]
I0112 05:55:31.567065 2147280 source_connector.cc:36] Initializing source connector: network_stats
I0112 05:55:31.567077 2147280 stirling.cc:394] Adding info class: [network_stats/network_stats]
I0112 05:55:31.567098 2147280 source_connector.cc:36] Initializing source connector: jvm_stats
I0112 05:55:31.567111 2147280 stirling.cc:394] Adding info class: [jvm_stats/jvm_stats]
I0112 05:55:31.567191 2147280 source_connector.cc:36] Initializing source connector: perf_profiler
I0112 05:55:31.567270 2147280 linux_headers.cc:209] Found Linux kernel version using .note section.
I0112 05:55:31.567283 2147280 linux_headers.cc:90] Obtained Linux version string from uname
: 4.19.91-24.1.al7.x86_64
I0112 05:55:31.567294 2147280 linux_headers.cc:582] Detected kernel release (uname -r): 4.19.91-24.1.al7.x86_64
I0112 05:55:31.567355 2147280 bcc_wrapper.cc:133] Using linux headers found at /lib/modules/4.19.91-24.1.al7.x86_64/build for BCC runtime.
I0112 05:55:33.590973 2147280 perf_profile_connector.cc:71] PerfProfiler: Stack trace profiling sampling probe successfully deployed.
I0112 05:55:33.591023 2147280 stirling.cc:394] Adding info class: [perf_profiler/stack_traces.beta]
I0112 05:55:33.591053 2147280 stirling.cc:360] Stirling successfully initialized.
I0112 05:55:33.612102 2147280 manager.cc:137] Hostname: iZbp1icepfw4uqqzbvkiseZ
F0112 05:56:31.110960 2147280 registration.cc:48] Timeout waiting for registration ack
*** Check failure stack trace: ***
@ 0x7ba0d3d google::LogMessage::Fail()
@ 0x7ba0177 google::LogMessage::SendToLog()
@ 0x7ba0a1e google::LogMessage::Flush()
@ 0x7ba3b7c google::LogMessageFatal::~LogMessageFatal()
@ 0x5d42aed px::vizier::agent::RegistrationHandler::RegistrationHandler()::$_0::operator()()
@ 0x5d42a0d std::__invoke_impl<>()
@ 0x5d429ad _ZSt10__invoke_rIvRZN2px6vizier5agent19RegistrationHandlerC1EPNS0_5event10DispatcherEPNS2_4InfoEPNS4_13NATSConnectorINS1_8messages13VizierMessageEEESt8functionIFNS0_6StatusEjEESH_E3$0JEENSt9enable_ifIX16is_invocable_r_vIT_T0_DpT1_EESL_E4typeEOSM_DpOSN
@ 0x5d428ad std::_Function_handler<>::_M_invoke()
@ 0x224257e std::function<>::operator()()
@ 0x6b7e38d px::event::LibuvTimer::EnableTimer()::$_0::operator()()
@ 0x6b7e355 px::event::LibuvTimer::EnableTimer()::$_0::__invoke()
@ 0x6b85556 uv__run_timers
@ 0x6b8cda2 uv_run
@ 0x6b7cb29 px::event::LibuvScheduler::Run()
@ 0x6b7d969 px::event::LibuvDispatcher::Run()
@ 0x5d0c19e px::vizier::agent::Manager::Run()
@ 0x22101e4 main
@ 0x7f996d0d2e0b __libc_start_main
@ 0x220fcee _start
E0112 05:56:31.255736 2147280 signal_action.cc:63] Caught Aborted, suspect faulting address 0x20c3d0. Trace:
PC: @ 0x7baadc2 (unknown) google::DumpStackTraceAndExit() @ 0x7ba0d3d (unknown) google::LogMessage::Fail() @ 0x7ba0177 (unknown) google::LogMessage::SendToLog() @ 0x7ba0a1e (unknown) google::LogMessage::Flush() @ 0x7ba3b7c (unknown) google::LogMessageFatal::~LogMessageFatal() @ 0x5d42aed (unknown) px::vizier::agent::RegistrationHandler::RegistrationHandler()::$_0::operator()() @ 0x5d42a0d (unknown) std::__invoke_impl<>() @ 0x5d429ad (unknown) std::__invoke_r<>() @ 0x5d428ad (unknown) std::_Function_handler<>::_M_invoke() @ 0x224257e (unknown) std::function<>::operator()() @ 0x6b7e38d (unknown) px::event::LibuvTimer::EnableTimer()::$_0::operator()() @ 0x6b7e355 (unknown) px::event::LibuvTimer::EnableTimer()::$_0::__invoke() @ 0x6b85556 (unknown) uv__run_timers @ 0x6b8cda2 (unknown) uv_run @ 0x6b7cb29 (unknown) px::event::LibuvScheduler::Run() @ 0x6b7d969 (unknown) px::event::LibuvDispatcher::Run() @ 0x5d0c19e (unknown) px::vizier::agent::Manager::Run() @ 0x22101e4 (unknown) main @ 0x7f996d0d2e0b (unknown) __libc_start_main @ 0x220fcee (unknown) _start
Did you ever find a fix for this? I'm running into the same issue.
Also seeing this error in a Bottlerocket environment.
The dump happens after this line:
F20220713 23:18:07.840595 1829577 registration.cc:48] Timeout waiting for registration ack
Complete logs from an affected vizier-pem
pod:
I20220713 23:17:16.655879 1829577 env.cc:47] Started: /app/src/vizier/services/agent/pem/pem
Started external stacktrace collection signal processor thread
I20220713 23:17:16.656319 1829577 pem_main.cc:93] Pixie PEM. Version: v0.11.7+Distribution.062b426.202207132133.1.RELEASE.jenkins, id: d3670e52-e3e2-4ef3-8172-06d60eed3237
I20220713 23:17:16.656483 1829577 stirling.cc:927] Creating Stirling, registered sources: [process_stats, network_stats, jvm_stats, socket_tracer, perf_profiler, proc_exit_tracer, stirling_error]
I20220713 23:17:16.656527 1829577 system_info.cc:41] Location of proc: /host/proc
I20220713 23:17:16.656556 1829577 system_info.cc:42] Location of sysfs: /sys/fs
I20220713 23:17:16.656565 1829577 system_info.cc:43] Number of CPUs: 2
I20220713 23:17:16.656662 1829577 system_info.cc:34] /host/proc/version:
Linux version 5.10.118 (builder@buildkitsandbox) (x86_64-bottlerocket-linux-gnu-gcc (Buildroot 2021.02.3) 10.3.0, GNU ld (GNU Binutils) 2.35.2) #1 SMP Thu Jun 9 01:24:07 UTC 2022
I20220713 23:17:16.656774 1829577 system_info.cc:34] /host/etc/os-release:
NAME=Bottlerocket
ID=bottlerocket
VERSION="1.8.0 (aws-k8s-1.22)"
PRETTY_NAME="Bottlerocket OS 1.8.0 (aws-k8s-1.22)"
VARIANT_ID=aws-k8s-1.22
VERSION_ID=1.8.0
BUILD_ID=a6233c22
HOME_URL="https://github.com/bottlerocket-os/bottlerocket"
SUPPORT_URL="https://github.com/bottlerocket-os/bottlerocket/discussions"
BUG_REPORT_URL="https://github.com/bottlerocket-os/bottlerocket/issues"
I20220713 23:17:16.656805 1829577 probe_cleaner.cc:102] Cleaning probes from /sys/kernel/debug/tracing/kprobe_events with the following marker: __pixie__
I20220713 23:17:16.657984 1829577 probe_cleaner.cc:117] All Stirling probes removed (count=0)
I20220713 23:17:16.658003 1829577 probe_cleaner.cc:102] Cleaning probes from /sys/kernel/debug/tracing/uprobe_events with the following marker: __pixie__
I20220713 23:17:16.658033 1829577 probe_cleaner.cc:117] All Stirling probes removed (count=0)
I20220713 23:17:16.658043 1829577 source_connector.cc:36] Initializing source connector: process_stats
I20220713 23:17:16.658056 1829577 stirling.cc:447] Adding info class: [process_stats/process_stats]
I20220713 23:17:16.658069 1829577 source_connector.cc:36] Initializing source connector: network_stats
I20220713 23:17:16.658077 1829577 stirling.cc:447] Adding info class: [network_stats/network_stats]
I20220713 23:17:16.658088 1829577 source_connector.cc:36] Initializing source connector: jvm_stats
I20220713 23:17:16.658097 1829577 stirling.cc:447] Adding info class: [jvm_stats/jvm_stats]
I20220713 23:17:16.658335 1829577 source_connector.cc:36] Initializing source connector: socket_tracer
I20220713 23:17:16.658366 1829577 linux_headers.cc:209] Found Linux kernel version using .note section.
I20220713 23:17:16.658380 1829577 linux_headers.cc:90] Obtained Linux version string from `uname`: 5.10.118
I20220713 23:17:16.658389 1829577 linux_headers.cc:599] Detected kernel release (uname -r): 5.10.118
I20220713 23:17:16.658619 1829577 linux_headers.cc:438] Looking for host mounted headers at /host/lib/modules/5.10.118
I20220713 23:17:16.658747 1829577 linux_headers.cc:475] Linked linux headers found at /host/usr/src/kernels/5.10.118 to symlink at /lib/modules/5.10.118/source
I20220713 23:17:16.658794 1829577 linux_headers.cc:483] Linked linux headers found at /host/usr/src/kernels/5.10.118 to symlink at /lib/modules/5.10.118/build
I20220713 23:17:16.658818 1829577 bcc_wrapper.cc:120] Using linux headers found at /lib/modules/5.10.118/source for BCC runtime.
I20220713 23:17:29.111289 1829577 socket_trace_connector.cc:390] Number of kprobes deployed = 40
I20220713 23:17:29.111335 1829577 socket_trace_connector.cc:391] Probes successfully deployed.
I20220713 23:17:29.111382 1829577 socket_trace_connector.cc:333] Initializing perf buffers with ncpus=2 and scaling_factor=0.9
I20220713 23:17:29.111410 1829577 socket_trace_connector.cc:322] Total perf buffer usage for kData buffers across all cpus: 75497472
I20220713 23:17:29.111426 1829577 socket_trace_connector.cc:322] Total perf buffer usage for kControl buffers across all cpus: 3963614
I20220713 23:17:29.111438 1829577 bcc_wrapper.cc:345] Opening perf buffer: socket_data_events [requested_size=18874368 num_pages=8192 size=33554432] (per cpu)
I20220713 23:17:29.127267 1829577 bcc_wrapper.cc:345] Opening perf buffer: socket_control_events [requested_size=943718 num_pages=256 size=1048576] (per cpu)
I20220713 23:17:29.127959 1829577 bcc_wrapper.cc:345] Opening perf buffer: conn_stats_events [requested_size=943718 num_pages=256 size=1048576] (per cpu)
I20220713 23:17:29.128603 1829577 bcc_wrapper.cc:345] Opening perf buffer: mmap_events [requested_size=94371 num_pages=32 size=131072] (per cpu)
I20220713 23:17:29.148730 1829577 bcc_wrapper.cc:345] Opening perf buffer: go_grpc_events [requested_size=18874368 num_pages=8192 size=33554432] (per cpu)
I20220713 23:17:29.165652 1829577 socket_trace_connector.cc:395] Number of perf buffers opened = 5
I20220713 23:17:29.216533 1829577 stirling.cc:447] Adding info class: [socket_tracer/conn_stats]
I20220713 23:17:29.216575 1829577 stirling.cc:447] Adding info class: [socket_tracer/http_events]
I20220713 23:17:29.216585 1829577 stirling.cc:447] Adding info class: [socket_tracer/mysql_events]
I20220713 23:17:29.216594 1829577 stirling.cc:447] Adding info class: [socket_tracer/cql_events]
I20220713 23:17:29.216603 1829577 stirling.cc:447] Adding info class: [socket_tracer/pgsql_events]
I20220713 23:17:29.216612 1829577 stirling.cc:447] Adding info class: [socket_tracer/dns_events]
I20220713 23:17:29.216621 1829577 stirling.cc:447] Adding info class: [socket_tracer/redis_events]
I20220713 23:17:29.216631 1829577 stirling.cc:447] Adding info class: [socket_tracer/nats_events.beta]
I20220713 23:17:29.216697 1829577 stirling.cc:447] Adding info class: [socket_tracer/kafka_events.beta]
I20220713 23:17:29.216714 1829577 stirling.cc:447] Adding info class: [socket_tracer/mux_events]
I20220713 23:17:29.216769 1829577 source_connector.cc:36] Initializing source connector: perf_profiler
I20220713 23:17:29.216830 1829577 linux_headers.cc:90] Obtained Linux version string from `uname`: 5.10.118
I20220713 23:17:29.216842 1829577 linux_headers.cc:599] Detected kernel release (uname -r): 5.10.118
I20220713 23:17:29.216933 1829577 bcc_wrapper.cc:120] Using linux headers found at /lib/modules/5.10.118/source for BCC runtime.
I20220713 23:17:30.723657 1829577 bcc_wrapper.cc:345] Opening perf buffer: histogram_a [requested_size=117828 num_pages=32 size=131072] (per cpu)
I20220713 23:17:30.723989 1829577 bcc_wrapper.cc:345] Opening perf buffer: histogram_b [requested_size=117828 num_pages=32 size=131072] (per cpu)
I20220713 23:17:30.724237 1829577 perf_profile_connector.cc:145] PerfProfiler: Stack trace profiling sampling probe successfully deployed.
I20220713 23:17:30.724285 1829577 perf_profile_connector.cc:161] PerfProfiler: Java symbolization enabled.
I20220713 23:17:30.724365 1829577 java_symbolizer.cc:214] JavaSymbolizer found agent lib /pl/lib-px-java-agent-musl.so.
I20220713 23:17:30.724457 1829577 java_symbolizer.cc:214] JavaSymbolizer found agent lib /pl/lib-px-java-agent-glibc.so.
I20220713 23:17:30.724480 1829577 stirling.cc:447] Adding info class: [perf_profiler/stack_traces.beta]
I20220713 23:17:30.724548 1829577 source_connector.cc:36] Initializing source connector: proc_exit_tracer
I20220713 23:17:30.724565 1829577 linux_headers.cc:90] Obtained Linux version string from `uname`: 5.10.118
I20220713 23:17:30.724576 1829577 linux_headers.cc:599] Detected kernel release (uname -r): 5.10.118
I20220713 23:17:30.724617 1829577 bcc_wrapper.cc:120] Using linux headers found at /lib/modules/5.10.118/source for BCC runtime.
I20220713 23:17:31.426700 1829577 bcc_wrapper.cc:345] Opening perf buffer: proc_exit_events [requested_size=5242880 num_pages=2048 size=8388608] (per cpu)
I20220713 23:17:31.431483 1829577 stirling.cc:447] Adding info class: [proc_exit_tracer/proc_exit_events]
I20220713 23:17:31.431712 1829577 source_connector.cc:36] Initializing source connector: stirling_error
I20220713 23:17:31.431871 1829577 stirling.cc:447] Adding info class: [stirling_error/stirling_error]
I20220713 23:17:31.431943 1829577 stirling.cc:447] Adding info class: [stirling_error/probe_status]
I20220713 23:17:31.432014 1829577 stirling.cc:413] Stirling successfully initialized.
I20220713 23:17:31.435824 1829577 manager.cc:156] Hostname: ip-10-15-52-141.ec2.internal
F20220713 23:18:07.840595 1829577 registration.cc:48] Timeout waiting for registration ack
*** Check failure stack trace: ***
E20220713 23:18:07.840950 1829577 signal_action.cc:63] Caught Aborted, suspect faulting address 0x1beac9. Trace:
**************************
PC: @ 0x7f97c9b677f3 (unknown) abort
@ 0x604e163 (unknown) google::LogMessage::SendToLog()
@ 0x604e47a (unknown) google::LogMessage::Flush()
@ 0x604feef (unknown) google::LogMessageFatal::~LogMessageFatal()
@ 0x5145ae7 (unknown) std::_Function_handler<>::_M_invoke()
@ 0x57c3df6 (unknown) uv__run_timers
@ 0x57c4b35 (unknown) uv_run
@ 0x5134e5a (unknown) px::vizier::agent::Manager::Run()
@ 0x1b0aeea (unknown) main
@ 0x7f97c9b68d90 (unknown) (unknown)
@ 0x7f97c9b68e40 (unknown) __libc_start_main
@ 0x1b0a8a5 (unknown) _start
**************************
Threads: 1829606
Stack trace:
PC: @ 0x610e2c4 (unknown) execute_native_thread_routine
@ 0x7f97c9bd3b43 (unknown) (unknown)
@ 0x7f97c9c65a00 (unknown) (unknown)
Threads: 1830396
Stack trace:
PC: @ 0x5c03c99 (unknown) natsCondition_Wait
@ 0x5bf367c (unknown) _flusher
@ 0x5c04354 (unknown) _threadStart
@ 0x7f97c9bd3b43 (unknown) (unknown)
@ 0x7f97c9c65a00 (unknown) (unknown)
Threads: 1830394
Stack trace:
PC: @ 0x5c03c99 (unknown) natsCondition_Wait
@ 0x5bfac12 (unknown) _asyncCbsThread
@ 0x5c04354 (unknown) _threadStart
@ 0x7f97c9bd3b43 (unknown) (unknown)
@ 0x7f97c9c65a00 (unknown) (unknown)
Threads: 1830395
Stack trace:
PC: @ 0x5c03c99 (unknown) natsCondition_Wait
@ 0x5bfadd8 (unknown) _garbageCollector
@ 0x5c04354 (unknown) _threadStart
@ 0x7f97c9bd3b43 (unknown) (unknown)
@ 0x7f97c9c65a00 (unknown) (unknown)
Threads: 1830397
Stack trace:
PC: @ 0x5c03c99 (unknown) natsCondition_Wait
@ 0x5c003bf (unknown) natsSub_deliverMsgs
@ 0x5c04354 (unknown) _threadStart
@ 0x7f97c9bd3b43 (unknown) (unknown)
@ 0x7f97c9c65a00 (unknown) (unknown)
Threads: 1830393
Stack trace:
PC: @ 0x5c03db0 (unknown) natsCondition_AbsoluteTimedWait
@ 0x5bfa936 (unknown) _timerThread
@ 0x5c04354 (unknown) _threadStart
@ 0x7f97c9bd3b43 (unknown) (unknown)
@ 0x7f97c9c65a00 (unknown) (unknown)
Threads: 1830392
Stack trace:
PC: @ 0x60725f0 (unknown) AbslInternalPerThreadSemWait_lts_20211102
@ 0x6072034 (unknown) absl::lts_20211102::CondVar::WaitCommon()
@ 0x5f62443 (unknown) gpr_cv_wait
@ 0x5f20ef5 (unknown) timer_thread()
@ 0x5f664ac (unknown) grpc_core::(anonymous namespace)::ThreadInternalsPosix::ThreadInternalsPosix()::{lambda()#1}::__invoke()
@ 0x7f97c9bd3b43 (unknown) (unknown)
@ 0x7f97c9c65a00 (unknown) (unknown)
Threads: 1830390, 1830391
Stack trace:
PC: @ 0x60725f0 (unknown) AbslInternalPerThreadSemWait_lts_20211102
@ 0x6072034 (unknown) absl::lts_20211102::CondVar::WaitCommon()
@ 0x5f62453 (unknown) gpr_cv_wait
@ 0x5f502a1 (unknown) grpc_core::Executor::ThreadMain()
@ 0x5f664ac (unknown) grpc_core::(anonymous namespace)::ThreadInternalsPosix::ThreadInternalsPosix()::{lambda()#1}::__invoke()
@ 0x7f97c9bd3b43 (unknown) (unknown)
@ 0x7f97c9c65a00 (unknown) (unknown)
Threads: 1829577
Stack trace:
PC: @ 0x5f6f8b3 (unknown) px::SignalAction::SigHandler()
@ 0x7f97c9b81520 (unknown) (unknown)
@ 0x7f97c9bd5a7c (unknown) pthread_kill
@ 0x7f97c9b81476 (unknown) gsignal
@ 0x7f97c9b677f3 (unknown) abort
@ 0x604e163 (unknown) google::LogMessage::SendToLog()
@ 0x604e47a (unknown) google::LogMessage::Flush()
@ 0x604feef (unknown) google::LogMessageFatal::~LogMessageFatal()
@ 0x5145ae7 (unknown) std::_Function_handler<>::_M_invoke()
@ 0x57c3df6 (unknown) uv__run_timers
@ 0x57c4b35 (unknown) uv_run
@ 0x5134e5a (unknown) px::vizier::agent::Manager::Run()
@ 0x1b0aeea (unknown) main
@ 0x7f97c9b68d90 (unknown) (unknown)
@ 0x7f97c9b68e40 (unknown) __libc_start_main
@ 0x1b0a8a5 (unknown) _start
same issue here... did you find a fix ?
same issue here...
@foolishhumans @yantingqiu could you please provide a pixie-diag (https://github.com/wreckedred/pixie-diag) of your cluster? It would be helpful to confirm if your deployment matches against some of the previous bug reports.
@foolishhumans @yantingqiu could you please provide a pixie-diag (https://github.com/wreckedred/pixie-diag) of your cluster? It would be helpful to confirm if your deployment matches against some of the previous bug reports. I have solved this issue. The reason was that the "air gapped pixie" in the Install guide was missing permissions for the lease in the YAML file. The following is the log for my cloud-connector. By the way, there are many errors in the YAML file provided by air gapped pixie.