SIGSEGV from jaeger OTEL trace setattribute
Steps to reproduce the behavior (Required)
Not 100% sure, but I believe this may happen when you drop a materialized view that is in the process of ingesting. I believe this is what is causing all my compute nodes to crash.
Expected behavior (Required)
Dropping a materialized view does not result in seg faults
Real behavior (Required)
Dropping the materialized view during initial ingest crashes all compute nodes with error:
4.0.2 RELEASE (build 1f1aa9c distro ubuntu arch aarch64)
query_id:019b0e21-140a-7854-8dfb-2d290bfa8ecf, fragment_instance:019b0e21-140a-7854-8dfb-2d290bfa8ed0
*** Aborted at 1765468607 (unix time) try "date -d @1765468607" if you are using GNU date ***
PC: @ 0x12301d44 opentelemetry::v1::sdk::trace::Span::SetAttribute(opentelemetry::v1::nostd::string_view, absl::otel_v1::variant<bool, int, long, unsigned int, double, char const*, opentelemetry::v1::nostd::string_view, opentelemetry::v1::nostd::span<bool const, 1844674407
*** SIGSEGV (@0x0) received by PID 27 (TID 0xfffec959fa00) LWP(640) from PID 0; stack trace: ***
@ 0xffffb68853dc (/usr/lib/aarch64-linux-gnu/libc.so.6+0x853db)
@ 0x1068cd98 google::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*)
@ 0xffffb7f41850 ([vdso]+0x84f)
@ 0x12301d44 opentelemetry::v1::sdk::trace::Span::SetAttribute(opentelemetry::v1::nostd::string_view, absl::otel_v1::variant<bool, int, long, unsigned int, double, char const*, opentelemetry::v1::nostd::string_view, opentelemetry::v1::nostd::span<bool const, 1844674407
@ 0xd014d00 starrocks::OlapTableSink::close_wait(starrocks::RuntimeState*, starrocks::Status)
@ 0xd016384 starrocks::OlapTableSink::close(starrocks::RuntimeState*, starrocks::Status)
@ 0xcff921c starrocks::pipeline::OlapTableSinkOperator::pending_finish() const
@ 0xb8b290c starrocks::pipeline::GlobalDriverExecutor::_worker_thread()
@ 0xd359a34 starrocks::ThreadPool::dispatch_thread()
@ 0xd3506d8 starrocks::Thread::supervise_thread(void*)
@ 0xffffb6880398 (/usr/lib/aarch64-linux-gnu/libc.so.6+0x80397)
@ 0xffffb68e9e9c (/usr/lib/aarch64-linux-gnu/libc.so.6+0xe9e9b)
StarRocks version (Required)
- You can get the StarRocks version by executing SQL
select current_version()
4.0.2-1f1aa9c
please share your be.conf/cn.conf about the jaeger endpoint .
Hello, I was using
jaeger_endpoint = http://my_endpoint: 6831
Traces have been working fine for the most part. This issue only occurs sometimes
https://github.com/StarRocks/starrocks/blob/main/be/src/common/config.h#L1128-L1131 check the comment there, http:// shall be removed from the endpoint.
@kevincai I was mistaken and our deployed config does not have http:// in front. It is just the host and ip.
Notably, we get traces just fine so it does work. This seg fault happens only sometimes.
I've also been able to reproduce this on a new new cluster. We have had to disable Jaeger as we can't have all CN nodes crashing when an MV is deleted.