starrocks icon indicating copy to clipboard operation
starrocks copied to clipboard

SIGSEGV from jaeger OTEL trace setattribute

Open kyle-goodale-klaviyo opened this issue 2 weeks ago • 1 comments

Steps to reproduce the behavior (Required)

Not 100% sure, but I believe this may happen when you drop a materialized view that is in the process of ingesting. I believe this is what is causing all my compute nodes to crash.

Expected behavior (Required)

Dropping a materialized view does not result in seg faults

Real behavior (Required)

Dropping the materialized view during initial ingest crashes all compute nodes with error:

4.0.2 RELEASE (build 1f1aa9c distro ubuntu arch aarch64)
query_id:019b0e21-140a-7854-8dfb-2d290bfa8ecf, fragment_instance:019b0e21-140a-7854-8dfb-2d290bfa8ed0
*** Aborted at 1765468607 (unix time) try "date -d @1765468607" if you are using GNU date ***
PC: @         0x12301d44 opentelemetry::v1::sdk::trace::Span::SetAttribute(opentelemetry::v1::nostd::string_view, absl::otel_v1::variant<bool, int, long, unsigned int, double, char const*, opentelemetry::v1::nostd::string_view, opentelemetry::v1::nostd::span<bool const, 1844674407
*** SIGSEGV (@0x0) received by PID 27 (TID 0xfffec959fa00) LWP(640) from PID 0; stack trace: ***
    @     0xffffb68853dc (/usr/lib/aarch64-linux-gnu/libc.so.6+0x853db)
    @         0x1068cd98 google::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*)
    @     0xffffb7f41850 ([vdso]+0x84f)
    @         0x12301d44 opentelemetry::v1::sdk::trace::Span::SetAttribute(opentelemetry::v1::nostd::string_view, absl::otel_v1::variant<bool, int, long, unsigned int, double, char const*, opentelemetry::v1::nostd::string_view, opentelemetry::v1::nostd::span<bool const, 1844674407
    @          0xd014d00 starrocks::OlapTableSink::close_wait(starrocks::RuntimeState*, starrocks::Status)
    @          0xd016384 starrocks::OlapTableSink::close(starrocks::RuntimeState*, starrocks::Status)
    @          0xcff921c starrocks::pipeline::OlapTableSinkOperator::pending_finish() const
    @          0xb8b290c starrocks::pipeline::GlobalDriverExecutor::_worker_thread()
    @          0xd359a34 starrocks::ThreadPool::dispatch_thread()
    @          0xd3506d8 starrocks::Thread::supervise_thread(void*)
    @     0xffffb6880398 (/usr/lib/aarch64-linux-gnu/libc.so.6+0x80397)
    @     0xffffb68e9e9c (/usr/lib/aarch64-linux-gnu/libc.so.6+0xe9e9b)

StarRocks version (Required)

  • You can get the StarRocks version by executing SQL select current_version()

4.0.2-1f1aa9c

kyle-goodale-klaviyo avatar Dec 11 '25 16:12 kyle-goodale-klaviyo

please share your be.conf/cn.conf about the jaeger endpoint .

kevincai avatar Dec 12 '25 06:12 kevincai

Hello, I was using

jaeger_endpoint = http://my_endpoint: 6831

Traces have been working fine for the most part. This issue only occurs sometimes

kyle-goodale-klaviyo avatar Dec 12 '25 15:12 kyle-goodale-klaviyo

https://github.com/StarRocks/starrocks/blob/main/be/src/common/config.h#L1128-L1131 check the comment there, http:// shall be removed from the endpoint.

kevincai avatar Dec 12 '25 15:12 kevincai

@kevincai I was mistaken and our deployed config does not have http:// in front. It is just the host and ip.

Notably, we get traces just fine so it does work. This seg fault happens only sometimes.

kyle-goodale-klaviyo avatar Dec 15 '25 15:12 kyle-goodale-klaviyo

I've also been able to reproduce this on a new new cluster. We have had to disable Jaeger as we can't have all CN nodes crashing when an MV is deleted.

kyle-goodale-klaviyo avatar Dec 18 '25 15:12 kyle-goodale-klaviyo