opentelemetry-cpp
opentelemetry-cpp copied to clipboard
Crash in OLTP HTTP export
Describe your environment Built and running on linux,
cmake .. -DCMAKE_INSTALL_RPATH_USE_LINK_PATH=ON -DCMAKE_VERBOSE_MAKEFILE=ON -DCMAKE_CXX_STANDARD=17 \
-DWITH_STL=CXX17 -DBUILD_SHARED_LIBS=ON -DWITH_OTLP_HTTP=ON -DWITH_OTLP_GRPC=ON -DBUILD_TESTING=OFF
Protobuf version installed - 3.17.3
Steps to reproduce Don't have exact steps to reproduce, happens intermittently.
Backtrace
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x00000001b83c7a7d in ?? ()
[Current thread is 1 (Thread 0x7810b784da00 (LWP 23))]
#0 0x00000001b83c7a7d in ?? ()
#1 0x00007ffc046f68b0 in ?? ()
#2 0x00007810b87a237d in google::protobuf::RepeatedPtrField<opentelemetry::proto::trace::v1::ResourceSpans>::~RepeatedPtrField() ()
from /lib64/libopentelemetry_exporter_otlp_grpc.so
#3 0x00007810b87a174a in opentelemetry::proto::collector::trace::v1::ExportTraceServiceRequest::~ExportTraceServiceRequest() ()
from /lib64/libopentelemetry_exporter_otlp_grpc.so
#4 0x00007810b87721ff in opentelemetry::v1::exporter::otlp::OtlpHttpExporter::Export(opentelemetry::v1::nostd::span<std::unique_ptr<--Type <RET> for more, q to quit, c to continue without paging--
opentelemetry::v1::sdk::trace::Recordable, std::default_delete<opentelemetry::v1::sdk::trace::Recordable> >, 18446744073709551615ul> const&) () from /lib64/libopentelemetry_exporter_otlp_http.so
#5 0x00007810ba07113b in opentelemetry::v1::sdk::trace::SimpleSpanProcessor::OnEnd (this=0x6299ed3d66a0, span=...)
at /usr/include/opentelemetry/sdk/trace/simple_processor.h:51
#6 0x00007810b88cd9ba in opentelemetry::v1::sdk::trace::MultiSpanProcessor::OnEnd(std::unique_ptr<opentelemetry::v1::sdk::trace::Recordable, std::default_delete<opentelemetry::v1::sdk::trace::Recordable> >&&) () from /lib64/libopentelemetry_trace.so
#7 0x00007810b88d6654 in opentelemetry::v1::sdk::trace::Span::End(opentelemetry::v1::trace::EndSpanOptions const&) ()
from /lib64/libopentelemetry_trace.so
Additional Info
Crash appears to be on destruction of arena object in, https://github.com/open-telemetry/opentelemetry-cpp/blob/main/exporters/otlp/src/otlp_http_exporter.cc#L102
It's not apparent why this might happen... any help will be appreciated.
What's your version of otel-cpp and do you enable async exporting?
There was a thread safety problem before 1.10.0 in OTLP HTTP exporter when otel-cpp is built without async export(Without -DENABLE_ASYNC_EXPORT or WITH_ASYNC_EXPORT_PREVIEW).
@owent - 1.15, haven't enabled async exporting... is async export still in preview in 1.15?
@owent - 1.15, haven't enabled async exporting... is async export still in preview in 1.15?
gRPC async exporting is still in preview.
Does this problem happens when shuting down? Do you compile both otel-cpp and proto as dynamic library?Just wondering why the destructor of RepeatedPtrField<opentelemetry::proto::trace::v1::ResourceSpans> is in gRPC exporter.
It's HTTP exporter, and proto is from yum install.
We're investigating if it's memory corruption from somewhere else.
It's HTTP exporter, and proto is from yum install.
We're investigating if it's memory corruption from somewhere else.
Do you mean protobuf? I reviewed the codes and found the messages and arena will not leave the scope of OtlpHttpExporter::Export in my understanding.
I found another crash in #2982 when using metrics and timeout happens. Not sure if it relates this one.
Is there any solutions for this? I'm also facing this SIGSEV.
Using OTEL v1.16.1 , OTLP HTTP Exporter, Batch Processor.
@msiddhu Are you getting this crash during application shutdown? If yes, does doing ForceFlush() before shutdown helps?
@msiddhu Thanks for the separate confirmation.
Do you have more details, like a call stack ?
Saying "it crashes for me too" gives us next to nothing to work with.
The part which is really dubious is:
- a bug report about OTLP HTTP
- a call stack pointing to
libopentelemetry_exporter_otlp_grpc.so
Is this about OTLP HTTP or OLTP GRPC ? Was the application built with OTLP HTTP alone, OTLP GRPC alone, or both ?
@michalpristas Could you try main branch or #2983 ? Some std::async implementations of STLs may have bugs and crash sometimes, this PR replace these APIs with the more stable one.
We don't find more coredumps for servel days after this patch in our system.
We have not observed this crash after removing patch mentioned in https://github.com/open-telemetry/opentelemetry-cpp/issues/2382
The build failure ultimately boiled down to someone having done #define U in another library.