opentelemetry-cpp icon indicating copy to clipboard operation
opentelemetry-cpp copied to clipboard

Assertion `false && "i == variant_npos"' failed with -DWITH_STL=ON

Open sirzooro opened this issue 3 years ago • 20 comments

Describe your environment OS: CentOS Stream release 8 Compiler: gcc (GCC) 10.3.1 20210422 (Red Hat 10.3.1-1) OpenTelemetry libs compiled today (2/9/2022) from main branch using following command: cmake -DCMAKE_POSITION_INDEPENDENT_CODE=ON -DBUILD_SHARED_LIBS=ON -DWITH_JAEGER=ON -DWITH_STL=ON ..

Steps to reproduce Use this code:

#include <opentelemetry/exporters/jaeger/jaeger_exporter.h>
#include <opentelemetry/sdk/trace/simple_processor.h>
#include <opentelemetry/sdk/trace/tracer_provider.h>
#include <opentelemetry/trace/provider.h>

namespace sdktrace   = opentelemetry::sdk::trace;
namespace nostd     = opentelemetry::nostd;
namespace trace     = opentelemetry::trace;

int main()
{
    // Jaeger HTTP exporter
    opentelemetry::exporter::jaeger::JaegerExporterOptions opts;
    opts.transport_format  = opentelemetry::exporter::jaeger::TransportFormat::kThriftHttp;
    opts.endpoint = "localhost";
    opts.server_port =  6831;
    opts.headers = {{}}; // optional headers
    auto jaeger_udp_exporter =
        std::unique_ptr<sdktrace::SpanExporter>(new opentelemetry::exporter::jaeger::JaegerExporter(opts));
    
    auto resource_attributes = opentelemetry::sdk::resource::ResourceAttributes
    {
        {"service.name", "test"},
        {"service.instance.id", "test-123"}
    };
    auto resource = opentelemetry::sdk::resource::Resource::Create(resource_attributes);
    
    return 0;
}

Compile using this command: g++ -o test test.cc -O0 -g -std=c++11 -Wall -Wextra -Werror -Wno-unused-parameter -I/build/otlp/debug-stl/include -L/build/otlp/debug-stl/lib64 -Wl,-rpath,/build/otlp/debug-stl/lib64 -lopentelemetry_exporter_jaeger_trace -lopentelemetry_resources -lopentelemetry_trace

What is the expected behavior? No crash when app is started

What is the actual behavior? App crashes with following message:

test: /build/otlp/debug-stl/include/opentelemetry/nostd/./internal/absl/types/../types/internal/variant.h:426: absl::otel_v1::variant_internal::VisitIndicesSwitch<EndIndex>::Run<absl::otel_v1::variant_internal::VariantStateBaseDestructorNontrivial<bool, int, unsigned int, long int, double, std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::vector<bool, std::allocator >, std::vector<int, std::allocator >, std::vector<unsigned int, std::allocator >, std::vector<long int, std::allocator >, std::vector<double, std::allocator >, std::vector<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::allocator<std::__cxx11::basic_string<char, std::char_traits, std::allocator > > >, long unsigned int, std::vector<long unsigned int, std::allocator >, std::vector<unsigned char, std::allocator > >::Destroyer>::<lambda()>: Assertion `false && "i == variant_npos"' failed.

Additional context valgrind --num-callers=50 --track-origins=yes ./test shows that this is caused by uninitialized heap memory access. Here are few bottom callstack frames for this allocation:

==19559==    by 0x4F105F3: opentelemetry::v1::sdk::resource::Resource::Merge(opentelemetry::v1::sdk::resource::Resource const&) (in /build/otlp/debug-stl/lib64/libopentelemetry_resources.so)
==19559==    by 0x4F106E8: opentelemetry::v1::sdk::resource::Resource::Create(opentelemetry::v1::sdk::common::AttributeMap const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) (in /build/otlp/debug-stl/lib64/libopentelemetry_resources.so)
==19559==    by 0x41E6BA: main (test.cc:26)

App does not crash when Jaeger exporter is not created, when only resource is created - this is somehow related to this exporter.

Libraries compiled with -DWITH_STL=OFF works fine.

sirzooro avatar Feb 09 '22 19:02 sirzooro

-DWITH_STL=ON cmake option will set the HAVE_CPP_STDLIB pre-processor macro. You need to set this macro explicitly while compiling your test code using g++, something like below:

g++ -DHAVE_CPP_STDLIB -o test test.cc -O0 -g -std=c++11 -Wall -Wextra -Werror -Wno-unused-parameter -I/build/otlp/debug-stl/include -L/build/otlp/debug-stl/lib64 -Wl,-rpath,/build/otlp/debug-stl/lib64 -lopentelemetry_exporter_jaeger_trace -lopentelemetry_resources -lopentelemetry_trace

If you are still getting the error, please refer to the exact g++ command invoked for examples/jaeger through cmake, and give similar arguments in the compilation of your example code.

lalitb avatar Feb 15 '22 00:02 lalitb

Thanks, this fixed my problem. Libraries compiled from sources usually installs their config.h file which stores all defines like this one, so I assumed that I do not have to add it manually. It would be good if you could follow this approach too.

BTW, I also had to switch to C++17 - with C++11 I got tons of compilation errors after I added -DHAVE_CPP_STDLIB.

sirzooro avatar Feb 21 '22 14:02 sirzooro

BTW, I also had to switch to C++17 - with C++11 I got tons of compilation errors after I added -DHAVE_CPP_STDLIB.

Yes, that's expected. It tries to use std::variant, std::span from the STL library if HAVE_CPP_STDLIB is enabled, and so fails with C++11.

lalitb avatar Feb 21 '22 17:02 lalitb

BTW, I also had to switch to C++17 - with C++11 I got tons of compilation errors after I added -DHAVE_CPP_STDLIB.

Yes, that's expected. It tries to use std::variant, std::span from the STL library if HAVE_CPP_STDLIB is enabled, and so fails with C++11.

My point here is that header file(s) could have some #if at the beginning to check C++ version and #error early.

sirzooro avatar Feb 21 '22 19:02 sirzooro

**My point here is that header file(s) could have some #if at the beginning to check C++ version and #error early.

Agree, HAVE_CPP_STDLIB is the half-baked solution, not prioritized as it is not recommended to be used to maintain ABI compatibility. Also, there is a somewhat related issue #1071 to have more granular HAVE_CPP_STDLIB support ( tagged as help wanted). Feel free to contribute to the change you suggested to fail early with proper error.

lalitb avatar Feb 21 '22 19:02 lalitb

This issue was marked as stale due to lack of activity. It will be closed in 7 days if no furthur activity occurs.

github-actions[bot] avatar May 23 '22 02:05 github-actions[bot]

We use cmake CONFIG file and bazel dependency to auto declaration macros now. We can use configure_file to generate header file which include all macros during configure time.But I'm not sure if bazel has similar function.

owent avatar May 23 '22 02:05 owent

This issue was marked as stale due to lack of activity.

github-actions[bot] avatar Jul 23 '22 02:07 github-actions[bot]

I'm having the same assert from ABSL in our environment with OpenTelemetry 1.8.3, build on x64 windows with VS 2019 (16.11.21), In our case the error came when after the first async counter was read. In sdk\src\metrics\state\temporal_metric_storage.cc:138 there is this lambda:

result_to_export->GetAllEnteries(
      [&metric_data](const MetricAttributes &attributes, Aggregation &aggregation) {
        PointDataAttributes point_data_attr;
        point_data_attr.point_data = aggregation.ToPoint();
        point_data_attr.attributes = attributes;
        metric_data.point_data_attr_.emplace_back(std::move(point_data_attr));
        return true;
      });

In our case when the point_data_attr is created, the point_data_attr.point_data variant has an incorrect index value. ( 0xcccccccccccccc00 ) Due to this invalid value, when the read value was moved into the point_data_attr.point_data = aggregation.ToPoint(); then it tries to destroy the original value, but since the index is invalid it runs into the assert in api\include\opentelemetry\nostd\internal\absl\types\internal\variant.h:426.

elnoir avatar Aug 03 '23 14:08 elnoir

I've compiled the OpenTelemetry with STL + cpp17 support, and the issue is gone. So it seems the variant in Abseil is bogus. Can someone tell me which abseil version was used in OpenTelemetry 1.8.3?

elnoir avatar Aug 04 '23 08:08 elnoir

I've compiled the OpenTelemetry with STL + cpp17 support, and the issue is gone. So it seems the variant in Abseil is bogus. Can someone tell me which abseil version was used in OpenTelemetry 1.8.3?

@elnoir the versions used can be found in https://github.com/open-telemetry/opentelemetry-cpp/blob/main/third_party_release

esigo avatar Aug 07 '23 14:08 esigo

I am having exactly the same kind of issues described by @elnoir. Except not using the metrics SDK, but trying to get propagation to working on a GRPC server.

There seems to be some kind of issue with the abseil version in GRPC and opentelemetry conflicting?

I am using vcpkg to install both grpc and opentelemetry-cpp. ("builtin-baseline": "c8696863d371ab7f46e213d8f5ca923c4aef2a00") I tried customizing the abseil build used in vcpkg, in order to customize the cmake flags, but to no avail.

Any help would be greatly appreciated.

StefanVanDyck avatar Feb 14 '24 09:02 StefanVanDyck

@StefanVanDyck

I also had the same issue recently.

The problem was that some code is compiled with HAVE_ABSEIL, other code is not, causing the build to be incoherent.

When building the opentelemetry-cpp library, HAVE_ABSEIL will only be defined if the CMake option WITH_ABSEIL is set.

I think we may need to cleanup this area, and use something like OTEL_HAVE_ABSEIL to avoid side effects with other codes, like grpc.

marcalff avatar Feb 14 '24 10:02 marcalff

@marcalff

Thank you so much for the response, I was slowly loosing my sanity trying to figure out what I was doing wrong. (Not an experienced C++ developer)

The vcpkg version of opentelemetry should be compiled using the -DWITH_ABSEIL=ON cmake flag.

I tried using -DWITH_ABSEIL=OFF -DWITH_STL=ON in the opentelemetry-cpp build, but then I end up getting weirdness with the shared_ptr destructor causing a SIGSEGV.

So I guess what you are saying, is that the otel GRPC exporter (which I am also using) is compiled using GRPC with abseil. Whereas the actual GRPC library is not? So I might try overriding the GRPC build to make sure it is actually using abseil?

Again, thanks for the help!

StefanVanDyck avatar Feb 15 '24 08:02 StefanVanDyck

@StefanVanDyck

I also had the same issue recently.

The problem was that some code is compiled with HAVE_ABSEIL, other code is not, causing the build to be incoherent.

When building the opentelemetry-cpp library, HAVE_ABSEIL will only be defined if the CMake option WITH_ABSEIL is set.

I think we may need to cleanup this area, and use something like OTEL_HAVE_ABSEIL to avoid side effects with other codes, like grpc.

Does HAVE_ABSEIL will be automatically added when link opentelemetry_api when using cmake and it's compiled with WITH_ABSEIL? And all components depends opentelemetry_api and this macro should be auto declared when link any component of otel-cpp. I think this problem will only happen when users use cmake to compile otel-cpp and do not use cmake CONFIG package to import it. Just like my comments before , some project use configure_file to generate a header file to put all environemnt detection options in it, but I'm not familar with bazel and don't know how to do it with bazel. Or can we add a generated file just for cmake to avoid this problem?

BTW: I think gRPC do not solve this problem, because when we compile and use it with different versions of abseil-cpp, it won't work.

owent avatar Feb 15 '24 08:02 owent

@owent

I am using these lines in my CmakeLists.txt to link to the opentelemetry libraries.

find_package(opentelemetry-cpp CONFIG REQUIRED)
target_link_libraries(${PROJECT_NAME} PRIVATE ${OPENTELEMETRY_CPP_LIBRARIES})

Is that what you mean when you mention the "cmake CONFIG package"? Should I do it in some other way? Thanks!

StefanVanDyck avatar Feb 15 '24 14:02 StefanVanDyck

Success! I think?

Finally go things to work using the standard vcpkg libs after much trail and error. Not entirely sure what the magic ingredients was, though adding #define OPENTELEMETRY_STL_VERSION 2017 before including any opentelemetry headers might have been important?

Edit 1: This just caused other issues, my gTest test suite suddenly started working again, but then my normal executable target got weird... (i.e. SIGSEGV on creating a span context) Sorry for polluting this issue, pretty clear I am in a bit over my head here.

Edit 2: Actual Success! 🤞 Modified the vcpkg build using an overlay to have:

-DCMAKE_CXX_STANDARD=20
-DWITH_ABSEIL=OFF
-DWITH_STL=CXX20

And included #define OPENTELEMETRY_STL_VERSION 2020 before including any opentel headers. Thank you for the help.

StefanVanDyck avatar Feb 15 '24 15:02 StefanVanDyck

@owent

I am using these lines in my CmakeLists.txt to link to the opentelemetry libraries.

find_package(opentelemetry-cpp CONFIG REQUIRED)
target_link_libraries(${PROJECT_NAME} PRIVATE ${OPENTELEMETRY_CPP_LIBRARIES})

Is that what you mean when you mention the "cmake CONFIG package"? Should I do it in some other way? Thanks!

Yes, it is. But why do you target_link_libraries(${PROJECT_NAME} PRIVATE ${OPENTELEMETRY_CPP_LIBRARIES}) instead of target_link_libraries(${PROJECT_NAME} PUBLIC ${OPENTELEMETRY_CPP_LIBRARIES}) ?Only macros, compile options and link options of public dependencies will be auto added, if there is any target depends this ${PROJECT_NAME} and ${PROJECT_NAME} is also exported as CONFIG package.

owent avatar Feb 16 '24 16:02 owent

I ran into the same issue with C++ 17, opentelemetry-cpp-1.13.0, with Bazel 5.4.1.

main: bazel-out/k8-fastbuild/bin/external/io_opentelemetry_cpp/api/_virtual_includes/api/opentelemetry/nostd/./internal/absl/types/../types/internal/variant.h:426: absl::otel_v1::variant_internal::VisitIndicesSwitch<EndIndex>::Run(Op&&, std::size_t) [with Op = absl::otel_v1::variant_internal::VariantCoreAccess::MoveAssignVisitor<absl::otel_v1::variant_internal::VariantMoveAssignBaseNontrivial<bool, int, unsigned int, long int, double, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::vector<bool, std::allocator<bool> >, std::vector<int, std::allocator<int> >, std::vector<unsigned int, std::allocator<unsigned int> >, std::vector<long int, std::allocator<long int> >, std::vector<double, std::allocator<double> >, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, long unsigned int, std::vector<long unsigned int, std::allocator<long unsigned int> >, std::vector<unsigned char, std::allocator<unsigned char> > > >; long unsigned int EndIndex = 15; absl::otel_v1::variant_internal::VisitIndicesResultT<Op, long unsigned int> = void; std::size_t = long unsigned int]::<lambda()>: Assertion `false && "i == variant_npos"' failed.

I tried the above suggestion, adding or removing HAVE_ABSEIL, HAVE_CPP_STDLIB, but it makes no difference. Problem solved (at least for me) by upgrading to the latest client version, which at the time 1.14.1. I have the HAVE_ABSEIL macro on with the latest version and it succeeded.

qqu0127 avatar Mar 22 '24 08:03 qqu0127