[BUG] fail to unwind stack trace, lose data in one thread
Describe the bug
Run a simple cpp demo, but failing to parse symbols and failing to get profiling info from one thread.
I am build the agent and parca server from the latest main branch.
To Reproduce
#include <iostream>
#include <thread>
#include <chrono>
#include <cstring>
#include <unistd.h>
#include <sys/epoll.h>
void epollf() {
int epollFd = epoll_create1(0);
if (epollFd == -1) {
std::cerr << "Failed to create epoll instance: " << std::strerror(errno) << std::endl;
return;
}
struct epoll_event event;
event.events = EPOLLIN;
while (true) {
struct epoll_event events[10];
int numEvents = epoll_wait(epollFd, events, 1000, 0);
if (numEvents == -1) {
std::cerr << "epoll_wait error: " << std::strerror(errno) << std::endl;
break;
}
}
close(epollFd);
}
int main() {
std::thread epollThread(epollf);
while (true) {
}
epollThread.join();
return 0;
}
clang++ -g -o jjjttt main.cpp -lpthread ./jjjttt
Expected behavior
Profiling with all symobls.
Screenshots or Profiles (using https://pprof.me)
Only one thread is proflied...
The other thread has
Logs
sudo cat /sys/kernel/debug/tracing/trace_pipe
jjjttt-2177205 [005] d.h.. 5107453.799553: bpf_trace_printk: ~about to check shards found=1
jjjttt-2177205 [005] d.h.. 5107453.799553: bpf_trace_printk: ~checking shards now
jjjttt-2177205 [005] d.h.. 5107453.799554: bpf_trace_printk: [info] found chunk
jjjttt-2177205 [005] d.h.. 5107453.799554: bpf_trace_printk: pid 2177205 tgid 2177205
jjjttt-2177205 [005] d.h.. 5107453.799555: bpf_trace_printk: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
jjjttt-2177205 [005] d.h.. 5107453.799555: bpf_trace_printk: traversing stack using .eh_frame information!!
jjjttt-2177205 [005] d.h.. 5107453.799556: bpf_trace_printk: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
jjjttt-2177205 [005] d.h.. 5107453.799556: bpf_trace_printk: [debug] Within unwinding machinery loop
jjjttt-2177205 [005] d.h.. 5107453.799557: bpf_trace_printk: ## frame: 0
jjjttt-2177205 [005] d.h.. 5107453.799557: bpf_trace_printk: current pc: 559071098394
jjjttt-2177205 [005] d.h.. 5107453.799557: bpf_trace_printk: current sp: 7fff9cabaa10
jjjttt-2177205 [005] d.h.. 5107453.799558: bpf_trace_printk: current bp: 7fff9cabaa20
jjjttt-2177205 [005] d.h.. 5107453.799558: bpf_trace_printk: ~about to check shards found=1
jjjttt-2177205 [005] d.h.. 5107453.799558: bpf_trace_printk: ~checking shards now
jjjttt-2177205 [005] d.h.. 5107453.799559: bpf_trace_printk: [info] found chunk
jjjttt-2177205 [005] d.h.. 5107453.799559: bpf_trace_printk: le offset: 559071097000
jjjttt-2177205 [005] d.h.. 5107453.799560: bpf_trace_printk: ========== left 74625 right 74883
jjjttt-2177205 [005] d.h.. 5107453.799560: bpf_trace_printk: .done
jjjttt-2177205 [005] d.h.. 5107453.799560: bpf_trace_printk: => table_index: 74651
jjjttt-2177205 [005] d.h.. 5107453.799560: bpf_trace_printk: => adjusted pc: 1394
jjjttt-2177205 [005] d.h.. 5107453.799561: bpf_trace_printk: cfa type: 1, offset: 16 (row pc: 1374)
jjjttt-2177205 [005] d.h.. 5107453.799563: bpf_trace_printk: [error] previous_rip should not be zero. This can mean that the read failed, ret=-14 while reading @ 7fff9cabaa28.
agent log no obvious error.
parca server log(some symbols are missing as well)
Has storage symbolization request failed" err="fetching metadata: debuginfo metadata not found" and ErrUnknownPC
level=debug name=parca ts=2023-09-25T08:57:31.519863966Z caller=symbolizer.go:411 component=symbolizer msg="no locations to symbolize" build_id=69389d485a9793dbe873f0ea2c93e02efaa9aa3d pc_range_start=0x28700 pc_range_end=0x1bc4c1
level=debug name=parca ts=2023-09-25T08:57:31.519933345Z caller=symbolizer.go:411 component=symbolizer msg="no locations to symbolize" build_id=3146e5a5e66e1fd6ab59a39e486a9d246bc675c0 pc_range_start=0x2090 pc_range_end=0x2b375
level=debug name=parca ts=2023-09-25T08:57:31.519963794Z caller=symbolizer.go:299 component=symbolizer buildid=unknown msg="storage symbolization request failed" err="fetching metadata: debuginfo metadata not found"
level=debug name=parca ts=2023-09-25T08:57:31.520031807Z caller=symbolizer.go:411 component=symbolizer msg="no locations to symbolize" build_id=e22ba7829a55a0dec2201a0b6dac7ba236118561 pc_range_start=0xe030 pc_range_end=0x1e19b
level=debug name=parca ts=2023-09-25T08:57:31.520112609Z caller=symbolizer.go:411 component=symbolizer msg="no locations to symbolize" build_id=f90a926c28da963c0888cca1180ef28bb88f059b pc_range_start=0x2030 pc_range_end=0x2488
level=debug name=parca ts=2023-09-25T08:57:31.520184515Z caller=symbolizer.go:411 component=symbolizer msg="no locations to symbolize" build_id=d7435f9680bf6c38f11856638bbbc8a2b755e054 pc_range_start=0x2030 pc_range_end=0x554d0
level=debug name=parca ts=2023-09-25T08:57:31.520219354Z caller=symbolizer.go:299 component=symbolizer buildid=unknown msg="storage symbolization request failed" err="fetching metadata: debuginfo metadata not found"
level=debug name=parca ts=2023-09-25T08:57:31.520289037Z caller=symbolizer.go:411 component=symbolizer msg="no locations to symbolize" build_id=927b5ce7c32980646eaf7527a8b3d68c3da5ffea pc_range_start=0x10a0 pc_range_end=0x12d8
level=debug name=parca ts=2023-09-25T08:57:31.520358214Z caller=symbolizer.go:411 component=symbolizer msg="no locations to symbolize" build_id=5ea13eaa57911db0 pc_range_start=0xafb0 pc_range_end=0x4c023
level=debug name=parca ts=2023-09-25T08:57:31.520386623Z caller=symbolizer.go:299 component=symbolizer buildid=unknown msg="storage symbolization request failed" err="fetching metadata: debuginfo metadata not found"
level=debug name=parca ts=2023-09-25T08:57:31.520453982Z caller=symbolizer.go:411 component=symbolizer msg="no locations to symbolize" build_id=4b60568bc25641b07e93f84f98c77093f3b279f7 pc_range_start=0x6030 pc_range_end=0x1833e
level=debug name=parca ts=2023-09-25T08:57:31.520527563Z caller=symbolizer.go:411 component=symbolizer msg="no locations to symbolize" build_id=7ab150049702ea494c7c9b681d86dc2d59c89ee1 pc_range_start=0xb030 pc_range_end=0x4ebcd
level=debug name=parca ts=2023-09-25T08:57:31.520600582Z caller=symbolizer.go:411 component=symbolizer msg="no locations to symbolize" build_id=0665411b8b4fc787710290aacecac4b069bf89fe pc_range_start=0x4030 pc_range_end=0x4718
level=debug name=parca ts=2023-09-25T08:57:31.520681385Z caller=symbolizer.go:411 component=symbolizer msg="no locations to symbolize" build_id=55447a37514c4a317439786251326b5f762d31392f6f6b7835704d635279724779346e4d6e78414a6a2f6d647a5247354a536e6e38616e6c31636c5954612f38496435624e72466c744770475332794967704c pc_range_start=0x46aaa0 pc_range_end=0xd53aa8
level=debug name=parca ts=2023-09-25T08:57:31.520760407Z caller=symbolizer.go:411 component=symbolizer msg="no locations to symbolize" build_id=7bf66e3d4dd23edad286982c17e7df08d1ab6fb2 pc_range_start=0x10a0 pc_range_end=0x12e2
level=debug name=parca ts=2023-09-25T08:57:31.520839323Z caller=symbolizer.go:411 component=symbolizer msg="no locations to symbolize" build_id=7d78d330de33842118d405b4afe4d58878d71b41 pc_range_start=0xb030 pc_range_end=0x73617
**level=debug name=parca ts=2023-09-25T08:57:31.520871855Z caller=symbolizer.go:299 component=symbolizer buildid=unknown msg="storage symbolization request failed" err="fetching metadata: debuginfo metadata not found"**
level=debug name=parca ts=2023-09-25T08:57:31.520939133Z caller=symbolizer.go:411 component=symbolizer msg="no locations to symbolize" build_id=cc278076d1c0ecf82b5248dfffe7e2d43feafbb7 pc_range_start=0x7030 pc_range_end=0x7ee8
level=debug name=parca ts=2023-09-25T08:57:31.521048469Z caller=symbolizer.go:411 component=symbolizer msg="no locations to symbolize" build_id=cf6dd909194b0a9dd65eb30f7ec27c4002d0329d pc_range_start=0x3f030 pc_range_end=0x1fdb34
level=debug name=parca ts=2023-09-25T08:57:31.521126984Z caller=symbolizer.go:411 component=symbolizer msg="no locations to symbolize" build_id=3192c14e4fe2624ff4675ae5a5edfa8a599aa9db pc_range_start=0x2030 pc_range_end=0x2578
level=debug name=parca ts=2023-09-25T08:57:31.521195289Z caller=symbolizer.go:411 component=symbolizer msg="no locations to symbolize" build_id=d11886324c0b0501c46b595089d667e6894943af pc_range_start=0x3030 pc_range_end=0xa8e7
level=debug name=parca ts=2023-09-25T08:57:31.521273608Z caller=symbolizer.go:411 component=symbolizer msg="no locations to symbolize" build_id=6673306d4d734c383466637842373044627642442f5a4b76785f546b716f435f2d506957696b447a492f4270345f73557834666d616445414b796b64555f2f4e524d525f67667a4c365a554361577041657232 pc_range_start=0x401e40 pc_range_end=0x49b0a85
level=debug name=parca ts=2023-09-25T08:57:31.521462171Z caller=symbolizer.go:411 component=symbolizer msg="no locations to symbolize" build_id=229b7dc509053fe4df5e29e8629911f0c3bc66dd pc_range_start=0x28700 pc_range_end=0x1bc501
level=debug name=parca ts=2023-09-25T08:57:31.521507517Z caller=symbolizer.go:299 component=symbolizer buildid=036dde3b7b7d0159 msg="storage symbolization request failed" err="fetching metadata: debuginfo metadata not found"
level=debug name=parca ts=2023-09-25T08:57:31.521520533Z caller=symbolizer.go:289 component=symbolizer msg="mapping of location is empty, skipping"
level=debug name=parca ts=2023-09-25T08:57:31.521590103Z caller=symbolizer.go:516 component=symbolizer msg="symbolizing locations" build_id=e37fe1a879783838de78cbc8c80621fa685d58a2 count=2
level=debug name=parca ts=2023-09-25T08:57:31.522622451Z caller=symbolizer.go:590 component=symbolizer **msg="symbolized location" build_id=e37fe1a879783838de78cbc8c80621fa685d58a2 address=651472 lines_count=0 err="seek to PC: ErrUnknownPC" liner_type=*addr2line.DwarfLiner
level=debug name=parca ts=2023-09-25T08:57:31.523523449Z caller=symbolizer.go:590 component=symbolizer msg="symbolized location" build_id=e37fe1a879783838de78cbc8c80621fa685d58a2 address=654128 lines_count=0 err="seek to PC: ErrUnknownPC" liner_type=*addr2line.DwarfLiner**
level=debug name=parca ts=2023-09-25T08:57:31.52356687Z caller=symbolizer.go:191 component=symbolizer msg="symbolization loop completed"
level=debug name=parca ts=2023-09-25T08:57:37.245414226Z caller=badger.go:452 msg=GetOrCreateStacktraces stacktrace_keys_len=49
level=debug name=parca ts=2023-09-25T08:57:37.246795774Z caller=badger.go:58 msg="writeRequests called. Writing to value log"
level=debug name=parca ts=2023-09-25T08:57:37.246872034Z caller=badger.go:58 msg="Writing to memtable"
level=debug name=parca ts=2023-09-25T08:57:37.246937884Z caller=badger.go:58 msg="Sending updates to subscribers"
level=debug name=parca ts=2023-09-25T08:57:37.246969165Z caller=badger.go:58 msg="2 entries written"
level=debug name=parca ts=2023-09-25T08:57:37.266391517Z caller=server.go:314 msg="finished call" protocol=grpc grpc.component=server grpc.service=parca.profilestore.v1alpha1.ProfileStoreService grpc.method=WriteRaw grpc.method_type=unary peer.address=127.0.0.1:59764 grpc.start_time=2023-09-25T08:57:37Z grpc.request.deadline=2023-09-25T08:59:37Z grpc.code=OK grpc.time_ms=24.529
Software (please complete the following information):
- Parca Agent Version: [
parca-agent --version] - Parca Server Version (if applicable): [
parca --version]
Workload (please complete the following information):
- Runtime (if applicable):
- Compiler (if applicable):
Environment (please complete the following information):
- Linux Distribution: [
cat /etc/*-release] DISTRIB_ID=Ubuntu DISTRIB_RELEASE=22.04 DISTRIB_CODENAME=jammy DISTRIB_DESCRIPTION="Ubuntu 22.04.3 LTS" PRETTY_NAME="Ubuntu 22.04.3 LTS" NAME="Ubuntu" VERSION_ID="22.04" VERSION="22.04.3 LTS (Jammy Jellyfish)" VERSION_CODENAME=jammy ID=ubuntu ID_LIKE=debian HOME_URL="https://www.ubuntu.com/" SUPPORT_URL="https://help.ubuntu.com/" BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/" PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy" UBUNTU_CODENAME=jammy - Linux Version: [
uname -r] 5.15.0-78-generic - Arch: [
uname -m] x86_64 - Kubernetes Version (if applicable): [
kubectl --version] no - Container Runtime (if applicable): no
Additional context
@kakkoyun @javierhonduco Hi could you take a look at this? Thanks
There might be two issues:
- failed to symbolize stack
- missing profiling from some thread
We could simplify the reproduction by one thread only.
#include <iostream>
#include <thread>
#include <chrono>
#include <cstring>
#include <unistd.h>
#include <sys/epoll.h>
void epollf() {
int epollFd = epoll_create1(0);
if (epollFd == -1) {
std::cerr << "Failed to create epoll instance: " << std::strerror(errno) << std::endl;
return;
}
struct epoll_event event;
event.events = EPOLLIN;
while (true) {
struct epoll_event events[10];
int numEvents = epoll_wait(epollFd, events, 1000, 0);
if (numEvents == -1) {
std::cerr << "epoll_wait error: " << std::strerror(errno) << std::endl;
break;
}
}
close(epollFd);
}
int main() {
// not run the epoll thread
while (true) {
}
return 0;
}
and we got this in sudo cat /sys/kernel/debug/tracing/trace_pipe
jjjttt-2186346 [032] d.h.. 5108702.762783: bpf_trace_printk: ## frame: 2
jjjttt-2186346 [032] d.h.. 5108702.762783: bpf_trace_printk: current pc: 7fed17ba7e40
jjjttt-2186346 [032] d.h.. 5108702.762783: bpf_trace_printk: current sp: 7fff0f88d630
jjjttt-2186346 [032] d.h.. 5108702.762783: bpf_trace_printk: current bp: 1
jjjttt-2186346 [032] d.h.. 5108702.762784: bpf_trace_printk: ~about to check shards found=1
jjjttt-2186346 [032] d.h.. 5108702.762784: bpf_trace_printk: ~checking shards now
jjjttt-2186346 [032] d.h.. 5108702.762784: bpf_trace_printk: [info] found chunk
jjjttt-2186346 [032] d.h.. 5108702.762785: bpf_trace_printk: le offset: 7fed17b7e000
jjjttt-2186346 [032] d.h.. 5108702.762785: bpf_trace_printk: ========== left 29 right 29075
jjjttt-2186346 [032] d.h.. 5108702.762785: bpf_trace_printk: .done
jjjttt-2186346 [032] d.h.. 5108702.762786: bpf_trace_printk: => table_index: 266
jjjttt-2186346 [032] d.h.. 5108702.762786: bpf_trace_printk: => adjusted pc: 29e40
jjjttt-2186346 [032] d.h.. 5108702.762786: bpf_trace_printk: cfa type: 2, offset: 80 (row pc: 29ddd)
jjjttt-2186346 [032] d.h.. 5108702.762787: bpf_trace_printk: (bp_offset: -48, bp value stored at 7fff0f88d650)
jjjttt-2186346 [032] d.h.. 5108702.762788: bpf_trace_printk: previous ip: 55a475d9a135 (@ 7fff0f88d678)
jjjttt-2186346 [032] d.h.. 5108702.762788: bpf_trace_printk: previous sp: 7fff0f88d680
jjjttt-2186346 [032] d.h.. 5108702.762788: bpf_trace_printk: previous bp: 0
jjjttt-2186346 [032] d.h.. 5108702.762788: bpf_trace_printk: [debug] Within unwinding machinery loop
jjjttt-2186346 [032] d.h.. 5108702.762789: bpf_trace_printk: ## frame: 3
jjjttt-2186346 [032] d.h.. 5108702.762789: bpf_trace_printk: current pc: 55a475d9a135
jjjttt-2186346 [032] d.h.. 5108702.762789: bpf_trace_printk: current sp: 7fff0f88d680
jjjttt-2186346 [032] d.h.. 5108702.762790: bpf_trace_printk: current bp: 0
jjjttt-2186346 [032] d.h.. 5108702.762790: bpf_trace_printk: ~about to check shards found=1
jjjttt-2186346 [032] d.h.. 5108702.762790: bpf_trace_printk: ~checking shards now
jjjttt-2186346 [032] d.h.. 5108702.762790: bpf_trace_printk: [info] found chunk
jjjttt-2186346 [032] d.h.. 5108702.762791: bpf_trace_printk: le offset: 55a475d99000
jjjttt-2186346 [032] d.h.. 5108702.762791: bpf_trace_printk: ========== left 74596 right 74624
jjjttt-2186346 [032] d.h.. 5108702.762791: bpf_trace_printk: .done
jjjttt-2186346 [032] d.h.. 5108702.762792: bpf_trace_printk: => table_index: 74613
jjjttt-2186346 [032] d.h.. 5108702.762792: bpf_trace_printk: => adjusted pc: 1135
jjjttt-2186346 [032] d.h.. 5108702.762793: bpf_trace_printk: cfa type: 2, offset: 8 (row pc: 1114)
jjjttt-2186346 [032] d.h.. 5108702.762793: bpf_trace_printk: [info] null return address, end of stack
jjjttt-2186346 [032] d.h.. 5108702.762793: bpf_trace_printk: ======= reached main! =======
jjjttt-2186346 [032] d.h.. 5108702.762794: bpf_trace_printk: native stack hash 2006046535
jjjttt-2186346 [032] d.h.. 5108702.762795: bpf_trace_printk: [debug] not an interpreter
jjjttt-2186346 [032] d.h.. 5108702.762795: bpf_trace_printk: [MMR] aggregate stack count for pid: 2186346
jjjttt-2186346 [032] d.h.. 5108702.815397: bpf_trace_printk: ~about to check shards found=1
jjjttt-2186346 [032] d.h.. 5108702.815398: bpf_trace_printk: ~checking shards now
jjjttt-2186346 [032] d.h.. 5108702.815399: bpf_trace_printk: [info] found chunk
jjjttt-2186346 [032] d.h.. 5108702.815399: bpf_trace_printk: pid 2186346 tgid 2186346
jjjttt-2186346 [032] d.h.. 5108702.815399: bpf_trace_printk: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
jjjttt-2186346 [032] d.h.. 5108702.815400: bpf_trace_printk: traversing stack using .eh_frame information!!
jjjttt-2186346 [032] d.h.. 5108702.815400: bpf_trace_printk: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
jjjttt-2186346 [032] d.h.. 5108702.815401: bpf_trace_printk: [debug] Within unwinding machinery loop
jjjttt-2186346 [032] d.h.. 5108702.815401: bpf_trace_printk: ## frame: 0
jjjttt-2186346 [032] d.h.. 5108702.815401: bpf_trace_printk: current pc: 55a475d9a30b
jjjttt-2186346 [032] d.h.. 5108702.815402: bpf_trace_printk: current sp: 7fff0f88d580
jjjttt-2186346 [032] d.h.. 5108702.815402: bpf_trace_printk: current bp: 7fff0f88d580
jjjttt-2186346 [032] d.h.. 5108702.815402: bpf_trace_printk: ~about to check shards found=1
jjjttt-2186346 [032] d.h.. 5108702.815403: bpf_trace_printk: ~checking shards now
jjjttt-2186346 [032] d.h.. 5108702.815403: bpf_trace_printk: [info] found chunk
jjjttt-2186346 [032] d.h.. 5108702.815403: bpf_trace_printk: le offset: 55a475d99000
jjjttt-2186346 [032] d.h.. 5108702.815404: bpf_trace_printk: ========== left 74596 right 74624
jjjttt-2186346 [032] d.h.. 5108702.815404: bpf_trace_printk: .done
jjjttt-2186346 [032] d.h.. 5108702.815404: bpf_trace_printk: => table_index: 74622
jjjttt-2186346 [032] d.h.. 5108702.815405: bpf_trace_printk: => adjusted pc: 130b
jjjttt-2186346 [032] d.h.. 5108702.815405: bpf_trace_printk: cfa type: 1, offset: 16 (row pc: 1304)
jjjttt-2186346 [032] d.h.. 5108702.815406: bpf_trace_printk: (bp_offset: -16, bp value stored at 7fff0f88d580)
jjjttt-2186346 [032] d.h.. 5108702.815407: bpf_trace_printk: previous ip: 7fed17ba7d90 (@ 7fff0f88d588)
jjjttt-2186346 [032] d.h.. 5108702.815407: bpf_trace_printk: previous sp: 7fff0f88d590
jjjttt-2186346 [032] d.h.. 5108702.815407: bpf_trace_printk: previous bp: 1
jjjttt-2186346 [032] d.h.. 5108702.815408: bpf_trace_printk: [debug] Within unwinding machinery loop
jjjttt-2186346 [032] d.h.. 5108702.815408: bpf_trace_printk: ## frame: 1
jjjttt-2186346 [032] d.h.. 5108702.815408: bpf_trace_printk: current pc: 7fed17ba7d90
jjjttt-2186346 [032] d.h.. 5108702.815409: bpf_trace_printk: current sp: 7fff0f88d590
jjjttt-2186346 [032] d.h.. 5108702.815409: bpf_trace_printk: current bp: 1
jjjttt-2186346 [032] d.h.. 5108702.815409: bpf_trace_printk: ~about to check shards found=1
jjjttt-2186346 [032] d.h.. 5108702.815410: bpf_trace_printk: ~checking shards now
jjjttt-2186346 [032] d.h.. 5108702.815410: bpf_trace_printk: [info] found chunk
jjjttt-2186346 [032] d.h.. 5108702.815410: bpf_trace_printk: le offset: 7fed17b7e000
jjjttt-2186346 [032] d.h.. 5108702.815411: bpf_trace_printk: ========== left 29 right 29075
jjjttt-2186346 [032] d.h.. 5108702.815411: bpf_trace_printk: .done
jjjttt-2186346 [032] d.h.. 5108702.815411: bpf_trace_printk: => table_index: 257
jjjttt-2186346 [032] d.h.. 5108702.815412: bpf_trace_printk: => adjusted pc: 29d90
jjjttt-2186346 [032] d.h.. 5108702.815412: bpf_trace_printk: cfa type: 2, offset: 160 (row pc: 29d19)
jjjttt-2186346 [032] d.h.. 5108702.815413: bpf_trace_printk: previous ip: 7fed17ba7e40 (@ 7fff0f88d628)
jjjttt-2186346 [032] d.h.. 5108702.815413: bpf_trace_printk: previous sp: 7fff0f88d630
jjjttt-2186346 [032] d.h.. 5108702.815414: bpf_trace_printk: previous bp: 1
jjjttt-2186346 [032] d.h.. 5108702.815414: bpf_trace_printk: [debug] Within unwinding machinery loop
jjjttt-2186346 [032] d.h.. 5108702.815414: bpf_trace_printk: ## frame: 2
jjjttt-2186346 [032] d.h.. 5108702.815414: bpf_trace_printk: current pc: 7fed17ba7e40
jjjttt-2186346 [032] d.h.. 5108702.815415: bpf_trace_printk: current sp: 7fff0f88d630
jjjttt-2186346 [032] d.h.. 5108702.815415: bpf_trace_printk: current bp: 1
jjjttt-2186346 [032] d.h.. 5108702.815415: bpf_trace_printk: ~about to check shards found=1
jjjttt-2186346 [032] d.h.. 5108702.815416: bpf_trace_printk: ~checking shards now
jjjttt-2186346 [032] d.h.. 5108702.815416: bpf_trace_printk: [info] found chunk
jjjttt-2186346 [032] d.h.. 5108702.815416: bpf_trace_printk: le offset: 7fed17b7e000
jjjttt-2186346 [032] d.h.. 5108702.815417: bpf_trace_printk: ========== left 29 right 29075
jjjttt-2186346 [032] d.h.. 5108702.815417: bpf_trace_printk: .done
jjjttt-2186346 [032] d.h.. 5108702.815417: bpf_trace_printk: => table_index: 266
jjjttt-2186346 [032] d.h.. 5108702.815417: bpf_trace_printk: => adjusted pc: 29e40
jjjttt-2186346 [032] d.h.. 5108702.815418: bpf_trace_printk: cfa type: 2, offset: 80 (row pc: 29ddd)
jjjttt-2186346 [032] d.h.. 5108702.815419: bpf_trace_printk: (bp_offset: -48, bp value stored at 7fff0f88d650)
jjjttt-2186346 [032] d.h.. 5108702.815419: bpf_trace_printk: previous ip: 55a475d9a135 (@ 7fff0f88d678)
jjjttt-2186346 [032] d.h.. 5108702.815419: bpf_trace_printk: previous sp: 7fff0f88d680
jjjttt-2186346 [032] d.h.. 5108702.815420: bpf_trace_printk: previous bp: 0
jjjttt-2186346 [032] d.h.. 5108702.815420: bpf_trace_printk: [debug] Within unwinding machinery loop
jjjttt-2186346 [032] d.h.. 5108702.815420: bpf_trace_printk: ## frame: 3
jjjttt-2186346 [032] d.h.. 5108702.815421: bpf_trace_printk: current pc: 55a475d9a135
jjjttt-2186346 [032] d.h.. 5108702.815421: bpf_trace_printk: current sp: 7fff0f88d680
jjjttt-2186346 [032] d.h.. 5108702.815421: bpf_trace_printk: current bp: 0
jjjttt-2186346 [032] d.h.. 5108702.815421: bpf_trace_printk: ~about to check shards found=1
jjjttt-2186346 [032] d.h.. 5108702.815422: bpf_trace_printk: ~checking shards now
jjjttt-2186346 [032] d.h.. 5108702.815422: bpf_trace_printk: [info] found chunk
jjjttt-2186346 [032] d.h.. 5108702.815422: bpf_trace_printk: le offset: 55a475d99000
jjjttt-2186346 [032] d.h.. 5108702.815423: bpf_trace_printk: ========== left 74596 right 74624
jjjttt-2186346 [032] d.h.. 5108702.815423: bpf_trace_printk: .done
jjjttt-2186346 [032] d.h.. 5108702.815423: bpf_trace_printk: => table_index: 74613
jjjttt-2186346 [032] d.h.. 5108702.815423: bpf_trace_printk: => adjusted pc: 1135
jjjttt-2186346 [032] d.h.. 5108702.815424: bpf_trace_printk: cfa type: 2, offset: 8 (row pc: 1114)
jjjttt-2186346 [032] d.h.. 5108702.815424: bpf_trace_printk: [info] null return address, end of stack
jjjttt-2186346 [032] d.h.. 5108702.815425: bpf_trace_printk: ======= reached main! =======
jjjttt-2186346 [032] d.h.. 5108702.815425: bpf_trace_printk: native stack hash 2006046535
jjjttt-2186346 [032] d.h.. 5108702.815426: bpf_trace_printk: [debug] not an interpreter
jjjttt-2186346 [032] d.h.. 5108702.815427: bpf_trace_printk: [MMR] aggregate stack count for pid: 2186346
jjjttt-2186346 [032] d.h.. 5108702.868029: bpf_trace_printk: ~about to check shards found=1
jjjttt-2186346 [032] d.h.. 5108702.868031: bpf_trace_printk: ~checking shards now
jjjttt-2186346 [032] d.h.. 5108702.868031: bpf_trace_printk: [info] found chunk
jjjttt-2186346 [032] d.h.. 5108702.868031: bpf_trace_printk: pid 2186346 tgid 2186346
jjjttt-2186346 [032] d.h.. 5108702.868032: bpf_trace_printk: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
jjjttt-2186346 [032] d.h.. 5108702.868032: bpf_trace_printk: traversing stack using .eh_frame information!!
jjjttt-2186346 [032] d.h.. 5108702.868033: bpf_trace_printk: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
jjjttt-2186346 [032] d.h.. 5108702.868033: bpf_trace_printk: [debug] Within unwinding machinery loop
jjjttt-2186346 [032] d.h.. 5108702.868034: bpf_trace_printk: ## frame: 0
jjjttt-2186346 [032] d.h.. 5108702.868034: bpf_trace_printk: current pc: 55a475d9a30b
jjjttt-2186346 [032] d.h.. 5108702.868034: bpf_trace_printk: current sp: 7fff0f88d580
jjjttt-2186346 [032] d.h.. 5108702.868035: bpf_trace_printk: current bp: 7fff0f88d580
jjjttt-2186346 [032] d.h.. 5108702.868035: bpf_trace_printk: ~about to check shards found=1
jjjttt-2186346 [032] d.h.. 5108702.868035: bpf_trace_printk: ~checking shards now
jjjttt-2186346 [032] d.h.. 5108702.868036: bpf_trace_printk: [info] found chunk
jjjttt-2186346 [032] d.h.. 5108702.868036: bpf_trace_printk: le offset: 55a475d99000
jjjttt-2186346 [032] d.h.. 5108702.868037: bpf_trace_printk: ========== left 74596 right 74624
jjjttt-2186346 [032] d.h.. 5108702.868037: bpf_trace_printk: .done
jjjttt-2186346 [032] d.h.. 5108702.868037: bpf_trace_printk: => table_index: 74622
jjjttt-2186346 [032] d.h.. 5108702.868038: bpf_trace_printk: => adjusted pc: 130b
jjjttt-2186346 [032] d.h.. 5108702.868038: bpf_trace_printk: cfa type: 1, offset: 16 (row pc: 1304)
jjjttt-2186346 [032] d.h.. 5108702.868043: bpf_trace_printk: [error] previous_rip should not be zero. This can mean that the read failed, ret=-14 while reading @ 7fff0f88d588.
jjjttt-2186346 [032] d.h.. 5108702.920660: bpf_trace_printk: ~about to check shards found=1
jjjttt-2186346 [032] d.h.. 5108702.920661: bpf_trace_printk: ~checking shards now
jjjttt-2186346 [032] d.h.. 5108702.920661: bpf_trace_printk: [info] found chunk
jjjttt-2186346 [032] d.h.. 5108702.920661: bpf_trace_printk: pid 2186346 tgid 2186346
jjjttt-2186346 [032] d.h.. 5108702.920662: bpf_trace_printk: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
jjjttt-2186346 [032] d.h.. 5108702.920662: bpf_trace_printk: traversing stack using .eh_frame information!!
jjjttt-2186346 [032] d.h.. 5108702.920663: bpf_trace_printk: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
jjjttt-2186346 [032] d.h.. 5108702.920663: bpf_trace_printk: [debug] Within unwinding machinery loop
jjjttt-2186346 [032] d.h.. 5108702.920663: bpf_trace_printk: ## frame: 0
jjjttt-2186346 [032] d.h.. 5108702.920664: bpf_trace_printk: current pc: 55a475d9a30b
jjjttt-2186346 [032] d.h.. 5108702.920664: bpf_trace_printk: current sp: 7fff0f88d580
jjjttt-2186346 [032] d.h.. 5108702.920665: bpf_trace_printk: current bp: 7fff0f88d580
jjjttt-2186346 [032] d.h.. 5108702.920665: bpf_trace_printk: ~about to check shards found=1
jjjttt-2186346 [032] d.h.. 5108702.920665: bpf_trace_printk: ~checking shards now
jjjttt-2186346 [032] d.h.. 5108702.920666: bpf_trace_printk: [info] found chunk
jjjttt-2186346 [032] d.h.. 5108702.920666: bpf_trace_printk: le offset: 55a475d99000
jjjttt-2186346 [032] d.h.. 5108702.920667: bpf_trace_printk: ========== left 74596 right 74624
jjjttt-2186346 [032] d.h.. 5108702.920667: bpf_trace_printk: .done
jjjttt-2186346 [032] d.h.. 5108702.920667: bpf_trace_printk: => table_index: 74622
jjjttt-2186346 [032] d.h.. 5108702.920668: bpf_trace_printk: => adjusted pc: 130b
jjjttt-2186346 [032] d.h.. 5108702.920668: bpf_trace_printk: cfa type: 1, offset: 16 (row pc: 1304)
jjjttt-2186346 [032] d.h.. 5108702.920671: bpf_trace_printk: [error] previous_rip should not be zero. This can mean that the read failed, ret=-14 while reading @ 7fff0f88d588.
jjjttt-2186346 [032] d.h.. 5108702.973291: bpf_trace_printk: ~about to check shards found=1
jjjttt-2186346 [032] d.h.. 5108702.973291: bpf_trace_printk: ~checking shards now
jjjttt-2186346 [032] d.h.. 5108702.973292: bpf_trace_printk: [info] found chunk
jjjttt-2186346 [032] d.h.. 5108702.973292: bpf_trace_printk: pid 2186346 tgid 2186346
jjjttt-2186346 [032] d.h.. 5108702.973293: bpf_trace_printk: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
jjjttt-2186346 [032] d.h.. 5108702.973293: bpf_trace_printk: traversing stack using .eh_frame information!!
jjjttt-2186346 [032] d.h.. 5108702.973294: bpf_trace_printk: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
jjjttt-2186346 [032] d.h.. 5108702.973294: bpf_trace_printk: [debug] Within unwinding machinery loop
jjjttt-2186346 [032] d.h.. 5108702.973294: bpf_trace_printk: ## frame: 0
jjjttt-2186346 [032] d.h.. 5108702.973295: bpf_trace_printk: current pc: 55a475d9a30b
jjjttt-2186346 [032] d.h.. 5108702.973295: bpf_trace_printk: current sp: 7fff0f88d580
jjjttt-2186346 [032] d.h.. 5108702.973295: bpf_trace_printk: current bp: 7fff0f88d580
jjjttt-2186346 [032] d.h.. 5108702.973296: bpf_trace_printk: ~about to check shards found=1
jjjttt-2186346 [032] d.h.. 5108702.973296: bpf_trace_printk: ~checking shards now
jjjttt-2186346 [032] d.h.. 5108702.973296: bpf_trace_printk: [info] found chunk
jjjttt-2186346 [032] d.h.. 5108702.973297: bpf_trace_printk: le offset: 55a475d99000
jjjttt-2186346 [032] d.h.. 5108702.973297: bpf_trace_printk: ========== left 74596 right 74624
jjjttt-2186346 [032] d.h.. 5108702.973297: bpf_trace_printk: .done
jjjttt-2186346 [032] d.h.. 5108702.973298: bpf_trace_printk: => table_index: 74622
jjjttt-2186346 [032] d.h.. 5108702.973298: bpf_trace_printk: => adjusted pc: 130b
jjjttt-2186346 [032] d.h.. 5108702.973298: bpf_trace_printk: cfa type: 1, offset: 16 (row pc: 1304)
jjjttt-2186346 [032] d.h.. 5108702.973301: bpf_trace_printk: [error] previous_rip should not be zero. This can mean that the read failed, ret=-14 while reading @ 7fff0f88d588.
jjjttt-2186346 [032] d.h.. 5108703.025922: bpf_trace_printk: ~about to check shards found=1
jjjttt-2186346 [032] d.h.. 5108703.025923: bpf_trace_printk: ~checking shards now
jjjttt-2186346 [032] d.h.. 5108703.025923: bpf_trace_printk: [info] found chunk
jjjttt-2186346 [032] d.h.. 5108703.025923: bpf_trace_printk: pid 2186346 tgid 2186346
jjjttt-2186346 [032] d.h.. 5108703.025924: bpf_trace_printk: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
jjjttt-2186346 [032] d.h.. 5108703.025924: bpf_trace_printk: traversing stack using .eh_frame information!!
jjjttt-2186346 [032] d.h.. 5108703.025924: bpf_trace_printk: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
jjjttt-2186346 [032] d.h.. 5108703.025925: bpf_trace_printk: [debug] Within unwinding machinery loop
jjjttt-2186346 [032] d.h.. 5108703.025925: bpf_trace_printk: ## frame: 0
jjjttt-2186346 [032] d.h.. 5108703.025925: bpf_trace_printk: current pc: 55a475d9a30b
jjjttt-2186346 [032] d.h.. 5108703.025926: bpf_trace_printk: current sp: 7fff0f88d580
jjjttt-2186346 [032] d.h.. 5108703.025926: bpf_trace_printk: current bp: 7fff0f88d580
jjjttt-2186346 [032] d.h.. 5108703.025927: bpf_trace_printk: ~about to check shards found=1
jjjttt-2186346 [032] d.h.. 5108703.025927: bpf_trace_printk: ~checking shards now
jjjttt-2186346 [032] d.h.. 5108703.025927: bpf_trace_printk: [info] found chunk
jjjttt-2186346 [032] d.h.. 5108703.025927: bpf_trace_printk: le offset: 55a475d99000
jjjttt-2186346 [032] d.h.. 5108703.025928: bpf_trace_printk: ========== left 74596 right 74624
jjjttt-2186346 [032] d.h.. 5108703.025928: bpf_trace_printk: .done
jjjttt-2186346 [032] d.h.. 5108703.025929: bpf_trace_printk: => table_index: 74622
jjjttt-2186346 [032] d.h.. 5108703.025929: bpf_trace_printk: => adjusted pc: 130b
jjjttt-2186346 [032] d.h.. 5108703.025929: bpf_trace_printk: cfa type: 1, offset: 16 (row pc: 1304)
jjjttt-2186346 [032] d.h.. 5108703.025932: bpf_trace_printk: [error] previous_rip should not be zero. This can mean that the read failed, ret=-14 while reading @ 7fff0f88d588.
jjjttt-2186346 [032] d.h.. 5108703.078553: bpf_trace_printk: ~about to check shards found=1
jjjttt-2186346 [032] d.h.. 5108703.078554: bpf_trace_printk: ~checking shards now
jjjttt-2186346 [032] d.h.. 5108703.078554: bpf_trace_printk: [info] found chunk
jjjttt-2186346 [032] d.h.. 5108703.078554: bpf_trace_printk: pid 2186346 tgid 2186346
jjjttt-2186346 [032] d.h.. 5108703.078555: bpf_trace_printk: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
jjjttt-2186346 [032] d.h.. 5108703.078555: bpf_trace_printk: traversing stack using .eh_frame information!!
jjjttt-2186346 [032] d.h.. 5108703.078556: bpf_trace_printk: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
jjjttt-2186346 [032] d.h.. 5108703.078556: bpf_trace_printk: [debug] Within unwinding machinery loop
jjjttt-2186346 [032] d.h.. 5108703.078556: bpf_trace_printk: ## frame: 0
jjjttt-2186346 [032] d.h.. 5108703.078557: bpf_trace_printk: current pc: 55a475d9a30b
jjjttt-2186346 [032] d.h.. 5108703.078557: bpf_trace_printk: current sp: 7fff0f88d580
jjjttt-2186346 [032] d.h.. 5108703.078557: bpf_trace_printk: current bp: 7fff0f88d580
jjjttt-2186346 [032] d.h.. 5108703.078558: bpf_trace_printk: ~about to check shards found=1
jjjttt-2186346 [032] d.h.. 5108703.078558: bpf_trace_printk: ~checking shards now
jjjttt-2186346 [032] d.h.. 5108703.078558: bpf_trace_printk: [info] found chunk
jjjttt-2186346 [032] d.h.. 5108703.078558: bpf_trace_printk: le offset: 55a475d99000
jjjttt-2186346 [032] d.h.. 5108703.078559: bpf_trace_printk: ========== left 74596 right 74624
jjjttt-2186346 [032] d.h.. 5108703.078559: bpf_trace_printk: .done
jjjttt-2186346 [032] d.h.. 5108703.078559: bpf_trace_printk: => table_index: 74622
jjjttt-2186346 [032] d.h.. 5108703.078560: bpf_trace_printk: => adjusted pc: 130b
jjjttt-2186346 [032] d.h.. 5108703.078560: bpf_trace_printk: cfa type: 1, offset: 16 (row pc: 1304)
jjjttt-2186346 [032] d.h.. 5108703.078562: bpf_trace_printk: [error] previous_rip should not be zero. This can mean that the read failed, ret=-14 while reading @ 7fff0f88d588.
normal unwind vs error unwind
Thanks for opening this issue. Will take a look over the next few days.
Something that stands out is this error
errno -14 is EFAULT which means that the data isn't paged in, that can be for a variety of reasons, such as that area paged out to SWAP. Some BPF programs, such as the tracing ones we use here can't bring pages back to memory.
Could you try running this program with swap disabled to see if this is what could be happening?
I meet a similar problem in ubuntu 22.04 as well. I try to disable swap for a process with cgroup v2 like the following:
cgcreate -g memory:parca
cd /sys/fs/cgroup/parca/ && echo 0 > memory.swap.max
cgexec -g memory:parca <comm>
more symbols are resolved after disabling swap, but I can still see errors for the above process
bpf_trace_printk: [error] previous_rip should not be zero. This can mean that the read failed, ret=-14 while reading @ 7ffde661c8f8.
more symbols are resolved after disabling swap
More symbols are resolved or more stacks are unwound and sent to the server?
Thanks for trying disabling SWAP! Could you try running https://github.com/javierhonduco/bookmark on your machine while your applications is running and send the statistics belonging to your process' main executable and dynamic libraries? Sorry that I don't provide good instructions on how to install it or run it yet, but cargo run should do
more symbols are resolved after disabling swap
More symbols are resolved or more stacks are unwound and sent to the server?
Thanks for trying disabling SWAP! Could you try running https://github.com/javierhonduco/bookmark on your machine while your applications is running and send the statistics belonging to your process' main executable and dynamic libraries? Sorry that I don't provide good instructions on how to install it or run it yet, but
cargo runshould do
it's weird that the symbol resolution becomes as poor as it used to be with another a few runs...
/path/to/my/program PageStats { swapped: 0, present: 4, unmapped: 5, total: 5 }
/usr/lib/x86_64-linux-gnu/libc.so.6 PageStats { swapped: 0, present: 231, unmapped: 539, total: 539 }
[vvar] PageStats { swapped: 0, present: 0, unmapped: 4, total: 4 }
[vdso] PageStats { swapped: 0, present: 1, unmapped: 2, total: 2 }
/usr/lib/x86_64-linux-gnu/libm.so.6 PageStats { swapped: 0, present: 80, unmapped: 231, total: 231 }
/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.30 PageStats { swapped: 0, present: 349, unmapped: 553, total: 553 }
anon PageStats { swapped: 0, present: 16, unmapped: 2072, total: 2072 }
[vsyscall] PageStats { swapped: 0, present: 0, unmapped: 1, total: 1 }
/usr/lib/x86_64-linux-gnu/libgcc_s.so.1 PageStats { swapped: 0, present: 25, unmapped: 32, total: 32 }
/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2 PageStats { swapped: 0, present: 59, unmapped: 59, total: 59 }
[stack] PageStats { swapped: 0, present: 3, unmapped: 33, total: 33 }
[heap] PageStats { swapped: 0, present: 2, unmapped: 33, total: 33 }
Two thoughts, your program is totally reasonable and on my system it has perfectly valid DWARF information. I run Fedora 38 with clang++ version 16.0.6 (Fedora 16.0.6-2.fc38) on x86_64.
$ clang++ -fomit-frame-pointer first_repro.c -o repro
https://pprof.me/1e5151a
Also got no errors, and the profile looks reasonable to me (minus symbolization). Of course, this depends on a system-to-systems basis depending on what pages might be already present in memory or not
@dreamerlzl will take a closer look at these metrics next week, thanks for gathering them :)
update: with another environment, the symbol resolution looks fine here and no "rip should not be zero" for the simple example.
and the bookmark looks like this(without disabling swap):
[vdso] PageStats { swapped: 0, present: 1, unmapped: 2, total: 2 }
[vsyscall] PageStats { swapped: 0, present: 0, unmapped: 1, total: 1 }
[vvar] PageStats { swapped: 0, present: 0, unmapped: 4, total: 4 }
[stack] PageStats { swapped: 1, present: 2, unmapped: 33, total: 33 }
/path/to/the/example PageStats { swapped: 1, present: 2, unmapped: 5, total: 5 }
/usr/lib/libgcc_s.so.1 PageStats { swapped: 2, present: 24, unmapped: 37, total: 37 }
/usr/lib/libm.so.6 PageStats { swapped: 2, present: 72, unmapped: 237, total: 237 }
[heap] PageStats { swapped: 2, present: 0, unmapped: 33, total: 33 }
/usr/lib/ld-linux-x86-64.so.2 PageStats { swapped: 4, present: 49, unmapped: 53, total: 53 }
/usr/lib/libc.so.6 PageStats { swapped: 7, present: 259, unmapped: 576, total: 576 }
anon PageStats { swapped: 12, present: 514, unmapped: 2072, total: 2072 }
/usr/lib/libstdc++.so.6.0.32 PageStats { swapped: 14, present: 348, unmapped: 631, total: 631 }
environment:
Linux ***** 6.4.12-arch1-1 #1 SMP PREEMPT_DYNAMIC Thu, 24 Aug 2023 00:38:14 +0000 x86_64 GNU/Linux
parca: 0.19
parca-agent: 0.25.1
Could you also provide the info per physical page just for the stack memory mapping for the process that fails with rip should not be zero?
Could you also provide the info per physical page just for the stack memory mapping for the process that fails with
rip should not be zero?
It's strange that the problem can always be reproduced on an AWS bare metal with ubuntu 22.04.3 but not in a VM(with the same bare metal). Could it be the case that hardware security makes a difference?
@javierhonduco .
The latest update is this issue seems to be related to some security enhancements we patched on ubuntu. Still trying to identify the specific change.
Oh interesting, let us know as you learn more! (FYI Javier is on PTO this and next week)
@joeyleeeeeee97 Let us know what you find out. One more thing that could potentially be useful here would be the output of sudo bpftool prog dump xlated id $ID | grep "call bpf_probe" where $ID is the program id that can be found with sudo bpftool prog
@joeyleeeeeee97 Let us know what you find out. One more thing that could potentially be useful here would be the output of
sudo bpftool prog dump xlated id $ID | grep "call bpf_probe"where$IDis the program id that can be found withsudo bpftool prog
@javierhonduco On my side it's
266: (85) call bpf_probe_read_user#-69216 287: (85) call bpf_probe_read_user#-69216 698: (85) call bpf_probe_read_user#-69216 802: (85) call bpf_probe_read_user#-69216
for parca-agent commit: 1a09d0149d53f7e8c60287da7ada11b8ac29bf4d
BTW, I still suspect the hardware plays a role here. This problem only occurs on physical host but not VM of the same OS
Hi @dreamerlzl, I no longer work at Polar Signals and I am no longer contributing to this project. Feel free to contact another maintainer
Hi @dreamerlzl, I no longer work at Polar Signals and I am no longer contributing to this project. Feel free to contact another maintainer
Wish you all the best! Thank you for all your work to open-source world.
Ok, I believe this issue may be related to this message (we previously tested this on a NUMA machine). https://lore.kernel.org/bpf/[email protected]/T/
a previous discussion about the NUMA issue: https://lists.iovisor.org/g/iovisor-dev/topic/accessing_user_memory_and/21386221
I am pretty sure it's the same issue if this is indeed reproducing on NUMA machines but not on non-NUMA machines. Unfortunately, nobody from the kernel ever responded to that message.