memory leak from otlp http exporter
Describe your environment I'm using OPENTELEMETRY_VERSION "1.22.0"
Steps to reproduce the problem is that when application is finished, valgrind reports many memory leaks related to otlp http exporter (examples are below) and calling shutdown on TracerProvider dose not change that. the main function is just a unit-test that calls defined initial and terminate telemetry functions I wrote like this:
initial_telemetry_ut();
// do some work
terminate_telemetry_ut(); // for cleanup but it fails to release allocated memories completely
here is the summery of those two functions abow:
initial_telemetry_utsummery:
otlp_export::OtlpHttpExporterOptions options; options.url = "http://localhost:4318/v1/traces";
auto exporter = opentelemetry::exporter::otlp::OtlpHttpExporterFactory::Create(options);
auto processor = opentelemetry::sdk::trace::BatchSpanProcessorFactory::Create(std::move(exporter), {});
auto provider = opentelemetry::sdk::trace::TracerProviderFactory::Create(
std::move(processor), resources, std::move(the_sampler)
);
opentelemetry::nostd::shared_ptr<opentelemetry::trace::TracerProvider> api_provider = std::shared_ptr<opentelemetry::trace::TracerProvider>(
provider.release()
);
g_provider = api_provider; // g_provider is a global variable of type pentelemetry::nostd::shared_ptr<opentelemetry::trace::TracerProvider>
opentelemetry::trace::Provider::SetTracerProvider(api_provider);
terminate_telemetry_utsummery:
std::shared_ptr<otl_trace::TracerProvider> none;
trace_api::Provider::SetTracerProvider(none);
g_provider = nullptr; // the same global variable from above
finally here are some examples of errors in valgrind report:
...
==22678== 648 bytes in 9 blocks are still reachable in loss record 124 of 130
==22678== at 0x4846FA3: operator new(unsigned long) (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==22678== by 0x4BC2DD2: google::protobuf::EncodedDescriptorDatabase::DescriptorIndex::AddSymbol(google::protobuf::stringpiece_internal::StringPiece) (in /usr/lib/x86_64-linux-gnu/libprotobuf.so.32.0.12)
==22678== by 0x4BC43B1: google::protobuf::EncodedDescriptorDatabase::Add(void const*, int) (in /usr/lib/x86_64-linux-gnu/libprotobuf.so.32.0.12)
==22678== by 0x4B66C76: google::protobuf::DescriptorPool::InternalAddGeneratedFile(void const*, int) (in /usr/lib/x86_64-linux-gnu/libprotobuf.so.32.0.12)
==22678== by 0x4BDBB77: ??? (in /usr/lib/x86_64-linux-gnu/libprotobuf.so.32.0.12)
==22678== by 0x4AE02AA: ??? (in /usr/lib/x86_64-linux-gnu/libprotobuf.so.32.0.12)
==22678== by 0x400571E: call_init.part.0 (dl-init.c:74)
==22678== by 0x4005823: call_init (dl-init.c:120)
==22678== by 0x4005823: _dl_init (dl-init.c:121)
==22678== by 0x401F59F: ??? (in /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2)
...
==22678== 22,000 bytes in 125 blocks are still reachable in loss record 129 of 130
==22678== at 0x484D953: calloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==22678== by 0x60E2823: asn1_array2tree (in /usr/lib/x86_64-linux-gnu/libtasn1.so.6.6.3)
==22678== by 0x5472434: ??? (in /usr/lib/x86_64-linux-gnu/libgnutls.so.30.37.1)
==22678== by 0x5438EDF: ??? (in /usr/lib/x86_64-linux-gnu/libgnutls.so.30.37.1)
==22678== by 0x400571E: call_init.part.0 (dl-init.c:74)
==22678== by 0x4005823: call_init (dl-init.c:120)
==22678== by 0x4005823: _dl_init (dl-init.c:121)
==22678== by 0x401F59F: ??? (in /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2)
==22678==
==22678== 83,072 bytes in 472 blocks are still reachable in loss record 130 of 130
==22678== at 0x484D953: calloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==22678== by 0x60E2823: asn1_array2tree (in /usr/lib/x86_64-linux-gnu/libtasn1.so.6.6.3)
==22678== by 0x5472339: ??? (in /usr/lib/x86_64-linux-gnu/libgnutls.so.30.37.1)
==22678== by 0x5438EDF: ??? (in /usr/lib/x86_64-linux-gnu/libgnutls.so.30.37.1)
==22678== by 0x400571E: call_init.part.0 (dl-init.c:74)
==22678== by 0x4005823: call_init (dl-init.c:120)
==22678== by 0x4005823: _dl_init (dl-init.c:121)
==22678== by 0x401F59F: ??? (in /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2)
==22678==
==22678== LEAK SUMMARY:
==22678== definitely lost: 0 bytes in 0 blocks
==22678== indirectly lost: 0 bytes in 0 blocks
==22678== possibly lost: 0 bytes in 0 blocks
==22678== still reachable: 120,567 bytes in 890 blocks
==22678== suppressed: 0 bytes in 0 blocks
==22678==
==22678== For lists of detected and suppressed errors, rerun with: -s
==22678== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
What is the expected behavior? I expect to see no leaks in valgrind result
Tip: React with 👍 to help prioritize this issue. Please use comments to provide useful context, avoiding +1 or me too, to help us triage it. Learn more here.
I should mention that I tried to cast the g_provider global variable to TracerProvider to call ForceFlush and Shutdown on the provider and I got true value in return from calling those methods which indicates I was successfully shutdown the provider but the exact problems and leaks are reported on valgrind so I put them aside. even the size of leaked memories stays the same amount no matter what I have done.
@farzinlize if you are implementing ThreadInstrumentation interface call google::protobuf::ShutdownProtobufLibrary() in OnEnd() or at the end of your main()
I had memory leaks from protobuf too and in my case that was enough.
@gparlamas hey pal thank you for your query on the matter and thankfully, your point did eradicate errors originated from protobuf but there remains also another group of memory leak errors from libgnutls
I should add the fact that I didn't initialize such libraries so I expect opentelemetry-cpp to terminate those dependencies so I also suspect that I'm not terminating opentelemetry library completely
Protobuf's symbol pool may be used by many other components. So we can not shutdown it directly.
Could you please call google::protobuf::ShutdownProtobufLibrary() after shutdown otel-cpp and other components which may use protobuf?
@owent Yes that sounds good enough for me, although I would recommend something like a flag or option when such objects are creating to tell otel-cpp library to shutdown anything it depends on so me, the user, could choose if it wants to keep libraries such as protobuf or libgnutls open even after otel-cpp is terminated or not.
p.s: I tried calling
gnutls_global_deinit();and rest of the memory leak errors were also eliminated.
ok eventually my problem is solved but now I must link gnutls and protobuf separately to my own code to be able to call shutdown or deinit functions of those libraries from my own program, where I only wanted to use otel-cpp alone. cmake commands are noted below:
# ----- link gnutls and protobuf for opentelemetry cleanup only -----
# enable pkg-config for gnutls
find_package(PkgConfig REQUIRED)
pkg_check_modules(GNUTLS REQUIRED IMPORTED_TARGET gnutls)
# protobuf
include(FindProtobuf)
find_package(Protobuf REQUIRED)
include_directories(${PROTOBUF_INCLUDE_DIR})
link_libraries(${PROTOBUF_LIBRARY} PkgConfig::GNUTLS)
these are functions I called to eradicate memory leak problems at the end of my terminate_telemetry_ut
google::protobuf::ShutdownProtobufLibrary();
gnutls_global_deinit();
new headers I added to my code:
#include <google/protobuf/stubs/common.h>
#include <gnutls/gnutls.h>
Regarding protobuf, the only thing which is deleted with ShutdownProtobufLibrary() is the ShutdownData, which is allocated only when any of the OnShutdown*() functions is called (see protobuf/message_lite.cc). I wonder who calls any of this functions? Didn't find it in opentelemetry-cpp. Anyway, this is not the shown memory leak. It shows something on the PHP side.
Regarding protobuf, the only thing which is deleted with ShutdownProtobufLibrary() is the ShutdownData, which is allocated only when any of the OnShutdown*() functions is called (see protobuf/message_lite.cc). I wonder who calls any of this functions? Didn't find it in opentelemetry-cpp. Anyway, this is not the shown memory leak. It shows something on the PHP side.
It should be called by app after all components depend protobuf are shutdown. otel-cpp should not call it because it may not be the last component in a app which depends protobuf.