Memory Leak Detection by LSAN in Single-threaded Context with OpenMP Involvement
Hi,
I am currently using Codon-compiled shared library (.so) in a single-threaded context, LSAN (Leak Sanitizer) reports memory leaks originating from libomp.so, despite explicitly setting GC_set_markers_count to 1 before loading the shared library.
I wanna use Codon-compiled so's with dlopen/dlsym, seems I have to call main in the so to get something initialized, but it seems to be a mistake. Calling main introduced the leak.
Codon Version: v0.18.2
Reduced reproduce sample:
Host
#include <iostream>
#include <fstream>
#include <memory>
#include <sstream>
#include <vector>
#include <dlfcn.h>
#include <unistd.h>
#ifndef LD_BIN
#define LD_BIN "ld.lld-19"
#endif
#ifndef CODON_PATH
#define CODON_PATH "/home/metaloxide/doc/codon"
#endif
#define CODON_BIN CODON_PATH "/bin/codon"
#define CODON_LIB_PATH CODON_PATH "/lib"
#define CODON_LIB_INNER_PATH CODON_LIB_PATH "/codon"
#define CODON_RUNTIME_PATH CODON_LIB_INNER_PATH "/libcodonrt.so"
std::pair<std::string, int> exec(char const* cmd) {
// 1 GB
std::unique_ptr<char[]> buffer =
std::make_unique<char[]>(1 * 1024 * 1024 * 1024);
std::string result;
std::unique_ptr<FILE, decltype(&pclose)> pipe(popen(cmd, "r"), &pclose);
if (!pipe) {
throw std::runtime_error("popen() failed");
}
while (fgets(buffer.get(), 1 * 1024 * 1024 * 1024, pipe.get()) != nullptr) {
result += buffer.get();
}
// The pclose will be automatically called when pipe goes out of scope
int status = WEXITSTATUS(pipe.get() ? pclose(pipe.release()) : -1);
return {result, status};
}
struct BoehmGCHandlers {
std::unique_ptr<void, std::integral_constant<decltype(&dlclose), &dlclose>>
rt;
void (*GC_set_markers_count)(unsigned) = {};
void (*GC_gcollect_and_unmap)() = {};
void (*GC_clear_roots)() = {};
void (*GC_deinit)() = {};
size_t (*GC_get_memory_use)() = {};
};
static BoehmGCHandlers* bdwgc_handlers = nullptr;
BoehmGCHandlers config_boehm_gc() {
std::cout << CODON_RUNTIME_PATH << std::endl;
void* codonrt = dlopen(CODON_RUNTIME_PATH, RTLD_LAZY);
if (!codonrt) {
throw std::runtime_error("dlopen() failed");
}
void (*GC_set_markers_count)(unsigned) =
(void (*)(unsigned)) dlsym(codonrt, "GC_set_markers_count");
GC_set_markers_count(1);
BoehmGCHandlers handlers;
handlers.rt.reset(codonrt);
handlers.GC_set_markers_count = GC_set_markers_count;
handlers.GC_gcollect_and_unmap =
(void (*)()) dlsym(codonrt, "GC_gcollect_and_unmap");
handlers.GC_clear_roots = (void (*)()) dlsym(codonrt, "GC_clear_roots");
handlers.GC_deinit = (void (*)()) dlsym(codonrt, "GC_deinit");
handlers.GC_get_memory_use =
(size_t (*)()) dlsym(codonrt, "GC_get_memory_use");
return handlers;
}
int main() {
BoehmGCHandlers codonrt = config_boehm_gc();
bdwgc_handlers = &codonrt;
std::string compile_cmd;
std::string name = "sized_rang_tagged_pointer_8_32";
std::string link_cmd;
{
std::ostringstream cmdos;
std::unique_ptr<char, std::integral_constant<decltype(&free), &free>>
cwd_buf {getcwd(nullptr, 0)};
char* cwd = cwd_buf.get();
cmdos << CODON_BIN << " ";
cmdos << "build --relocation-model=pic ";
cmdos << "-o ";
cmdos << cwd << "/" << name << ".codon.o ";
cmdos << cwd << "/" << name << ".codon.py";
std::string cmd1 = cmdos.str();
cmdos = std::ostringstream{};
cmdos << LD_BIN << " ";
cmdos << cwd << "/" << name << ".codon.o ";
cmdos << "-shared ";
cmdos << "-L" << CODON_LIB_PATH << " ";
cmdos << "-rpath " << CODON_LIB_PATH << " ";
cmdos << "-L" << CODON_LIB_INNER_PATH << " ";
cmdos << "-rpath " << CODON_LIB_INNER_PATH << " ";
cmdos << "-lcodonrt ";
cmdos << "-o ";
cmdos << cwd << "/" << name << ".codon.so";
std::string cmd2 = cmdos.str();
std::cout << "cmd1: " << cmd1 << std::endl;
std::cout << "cmd2: " << cmd2 << std::endl;
compile_cmd = std::move(cmd1);
link_cmd = std::move(cmd2);
}
auto compile_pair = exec(compile_cmd.c_str());
if (compile_pair.second != 0) {
std::cerr << "compile failed: " << compile_pair.first << std::endl;
exit(1);
}
std::cout << "compile output: " << compile_pair.first << std::endl;
auto link_pair = exec(link_cmd.c_str());
if (link_pair.second != 0) {
std::cerr << "link failed: " << link_pair.first << std::endl;
exit(1);
}
std::cout << "link output: " << link_pair.first << std::endl;
std::unique_ptr<void, std::integral_constant<decltype(&dlclose), &dlclose>>
dl_handler_u {dlopen(
("./" + name + ".codon.so").c_str(),
RTLD_NOW | RTLD_LOCAL /*| RTLD_DEEPBIND*/
)};
void* dl_handler = dl_handler_u.get();
if (dl_handler == nullptr) {
std::cerr << "dlopen failed: " << dlerror() << std::endl;
exit(1);
}
std::cout << "dlopen output: " << dl_handler << std::endl;
void (*test_main)(int, char**) =
(void (*)(int, char**)) dlsym(dl_handler, "main");
if (test_main == nullptr) {
std::cerr << "dlsym failed: " << dlerror() << std::endl;
exit(1);
}
std::cout << "dlsym output (main): " << (void*) (test_main) << std::endl;
std::vector<char> mock_argv = {'t', 'e', 's', 't', '\0'};
char* mock_argvp[1] = {mock_argv.data()};
// Called main manually, please note here
test_main(1, mock_argvp);
bdwgc_handlers->GC_clear_roots();
bdwgc_handlers->GC_gcollect_and_unmap();
}
# sized_rang_tagged_pointer_8_32.py (Actually could be anything)
@export
def do_test_0(dump: Ptr[i8]) -> int:
print("111")
return 0
@export
def do_test_1(dump: Ptr[i8]) -> int:
print("222")
return 0
@export
def do_test_2(dump: Ptr[i8]) -> int:
print("333")
return 0
After
clang++-19 /home/metaloxide/doc/temp/reduced.cpp -std=c++14 -fsanitize=address -o reduced && ./reduced
I got
=================================================================
==964927==ERROR: LeakSanitizer: detected memory leaks
Direct leak of 2237 byte(s) in 2 object(s) allocated from:
#0 0x55c9d5877d6f in malloc (/cloudide/workspace/temp/reduced+0xcfd6f) (BuildId: 86fee8bc2cfa74ffb5de092eda4d1ce9b53ddc41)
#1 0x7fa25bd56767 (<unknown module>)
Direct leak of 48 byte(s) in 1 object(s) allocated from:
#0 0x55c9d5877f39 in calloc (/cloudide/workspace/temp/reduced+0xcff39) (BuildId: 86fee8bc2cfa74ffb5de092eda4d1ce9b53ddc41)
#1 0x7fa25bd6b49a (<unknown module>)
Direct leak of 24 byte(s) in 1 object(s) allocated from:
#0 0x55c9d5877d6f in malloc (/cloudide/workspace/temp/reduced+0xcfd6f) (BuildId: 86fee8bc2cfa74ffb5de092eda4d1ce9b53ddc41)
#1 0x7fa25bdeb627 (<unknown module>)
Indirect leak of 65 byte(s) in 1 object(s) allocated from:
#0 0x55c9d5877d6f in malloc (/cloudide/workspace/temp/reduced+0xcfd6f) (BuildId: 86fee8bc2cfa74ffb5de092eda4d1ce9b53ddc41)
#1 0x7fa25bdeb5b2 (<unknown module>)
where the unknown modules seem to be in libomp.so, is there any workround to get rid of it? Thanks a lot
Hi @MetalOxideSemi, I haven't gone through your code in detail yet, but a couple notes/questions:
-
You shouldn't actually have to call
main()when loading the library. When compiling to a shared library, Codon will place themain()code in a global constructor that should be called when youdlopen()the library. For example, if you just put aprint("hello")at the top level of the program, you should see the print output when loading the library. Does this not occur in your case? -
Are you actually using Codon's multithreading (i.e.
@par) in the code? If not, then OpenMP (libomp.so) shouldn't be invoked at all aside from some initialization. Perhaps the leak is due to the initialization routine being called twice...
Hi @MetalOxideSemi, I haven't gone through your code in detail yet, but a couple notes/questions:
- You shouldn't actually have to call
main()when loading the library. When compiling to a shared library, Codon will place themain()code in a global constructor that should be called when youdlopen()the library. For example, if you just put aprint("hello")at the top level of the program, you should see the print output when loading the library. Does this not occur in your case?- Are you actually using Codon's multithreading (i.e.
@par) in the code? If not, then OpenMP (libomp.so) shouldn't be invoked at all aside from some initialization. Perhaps the leak is due to the initialization routine being called twice...
Hi @arshajii,
Thanks for your response.
I thought dlopen did not call crt0 automatically in my case. If I did not call main manually, I won't see the output of print(). I found this when I was trying to print something within a function with @export:
@export
def foo() -> int:
print("foo")
return 0
#include <iostream>
#include <fstream>
#include <memory>
#include <sstream>
#include <vector>
#include <dlfcn.h>
#include <unistd.h>
#ifndef LD_BIN
#define LD_BIN "ld.lld-19"
#endif
#ifndef CODON_PATH
#define CODON_PATH "/cloudide/workspace/downloads/codon-clang11"
#endif
#define CODON_BIN CODON_PATH "/bin/codon"
#define CODON_LIB_PATH CODON_PATH "/lib"
#define CODON_LIB_INNER_PATH CODON_LIB_PATH "/codon"
#define CODON_RUNTIME_PATH CODON_LIB_INNER_PATH "/libcodonrt.so"
std::pair<std::string, int> exec(char const* cmd) {
// 1 GB
std::unique_ptr<char[]> buffer =
std::make_unique<char[]>(1 * 1024 * 1024 * 1024);
std::string result;
std::unique_ptr<FILE, decltype(&pclose)> pipe(popen(cmd, "r"), &pclose);
if (!pipe) {
throw std::runtime_error("popen() failed");
}
while (fgets(buffer.get(), 1 * 1024 * 1024 * 1024, pipe.get()) != nullptr) {
result += buffer.get();
}
// The pclose will be automatically called when pipe goes out of scope
int status = WEXITSTATUS(pipe.get() ? pclose(pipe.release()) : -1);
return {result, status};
}
struct BoehmGCHandlers {
std::unique_ptr<void, std::integral_constant<decltype(&dlclose), &dlclose>>
rt;
void (*GC_set_markers_count)(unsigned) = {};
void (*GC_gcollect_and_unmap)() = {};
void (*GC_clear_roots)() = {};
void (*GC_deinit)() = {};
size_t (*GC_get_memory_use)() = {};
};
static BoehmGCHandlers* bdwgc_handlers = nullptr;
BoehmGCHandlers config_boehm_gc() {
std::cout << CODON_RUNTIME_PATH << std::endl;
void* codonrt = dlopen(CODON_RUNTIME_PATH, RTLD_LAZY);
if (!codonrt) {
throw std::runtime_error("dlopen() failed");
}
void (*GC_set_markers_count)(unsigned) =
(void (*)(unsigned)) dlsym(codonrt, "GC_set_markers_count");
GC_set_markers_count(1);
BoehmGCHandlers handlers;
handlers.rt.reset(codonrt);
handlers.GC_set_markers_count = GC_set_markers_count;
handlers.GC_gcollect_and_unmap =
(void (*)()) dlsym(codonrt, "GC_gcollect_and_unmap");
handlers.GC_clear_roots = (void (*)()) dlsym(codonrt, "GC_clear_roots");
handlers.GC_deinit = (void (*)()) dlsym(codonrt, "GC_deinit");
handlers.GC_get_memory_use =
(size_t (*)()) dlsym(codonrt, "GC_get_memory_use");
return handlers;
}
int main() {
BoehmGCHandlers codonrt = config_boehm_gc();
bdwgc_handlers = &codonrt;
std::string compile_cmd;
std::string name = "sized_rang_tagged_pointer_8_32";
std::string link_cmd;
{
std::ostringstream cmdos;
std::unique_ptr<char, std::integral_constant<decltype(&free), &free>>
cwd_buf {getcwd(nullptr, 0)};
char* cwd = cwd_buf.get();
cmdos << CODON_BIN << " ";
cmdos << "build --relocation-model=pic ";
cmdos << "-o ";
cmdos << cwd << "/" << name << ".codon.o ";
cmdos << cwd << "/" << name << ".codon.py";
std::string cmd1 = cmdos.str();
cmdos = std::ostringstream{};
cmdos << LD_BIN << " ";
cmdos << cwd << "/" << name << ".codon.o ";
cmdos << "-shared ";
cmdos << "-L" << CODON_LIB_PATH << " ";
cmdos << "-rpath " << CODON_LIB_PATH << " ";
cmdos << "-L" << CODON_LIB_INNER_PATH << " ";
cmdos << "-rpath " << CODON_LIB_INNER_PATH << " ";
cmdos << "-lcodonrt ";
cmdos << "-o ";
cmdos << cwd << "/" << name << ".codon.so";
std::string cmd2 = cmdos.str();
std::cout << "cmd1: " << cmd1 << std::endl;
std::cout << "cmd2: " << cmd2 << std::endl;
compile_cmd = std::move(cmd1);
link_cmd = std::move(cmd2);
}
auto compile_pair = exec(compile_cmd.c_str());
if (compile_pair.second != 0) {
std::cerr << "compile failed: " << compile_pair.first << std::endl;
exit(1);
}
std::cout << "compile output: " << compile_pair.first << std::endl;
auto link_pair = exec(link_cmd.c_str());
if (link_pair.second != 0) {
std::cerr << "link failed: " << link_pair.first << std::endl;
exit(1);
}
std::cout << "link output: " << link_pair.first << std::endl;
std::unique_ptr<void, std::integral_constant<decltype(&dlclose), &dlclose>>
dl_handler_u {dlopen(
("./" + name + ".codon.so").c_str(),
RTLD_NOW | RTLD_LOCAL /*| RTLD_DEEPBIND*/
)};
void* dl_handler = dl_handler_u.get();
if (dl_handler == nullptr) {
std::cerr << "dlopen failed: " << dlerror() << std::endl;
exit(1);
}
std::cout << "dlopen output: " << dl_handler << std::endl;
void (*test_main)(int, char**) =
(void (*)(int, char**)) dlsym(dl_handler, "main");
if (test_main == nullptr) {
std::cerr << "dlsym failed: " << dlerror() << std::endl;
exit(1);
}
std::cout << "dlsym output (main): " << (void*) (test_main) << std::endl;
int (*test_foo)() = (int (*)()) dlsym(dl_handler, "foo");
if (test_foo == nullptr) {
std::cerr << "dlsym failed: " << dlerror() << std::endl;
exit(1);
}
std::vector<char> mock_argv = {'t', 'e', 's', 't', '\0'};
char* mock_argvp[1] = {mock_argv.data()};
// Called main manually, please note here
// test_main(1, mock_argvp);
test_foo();
bdwgc_handlers->GC_clear_roots();
bdwgc_handlers->GC_gcollect_and_unmap();
}
Running it produced a segement fault:
dlopen output: 0x51a000001e80
dlsym output (main): 0x7f590a31dec0
AddressSanitizer:DEADLYSIGNAL
=================================================================
==558740==ERROR: AddressSanitizer: UNKNOWN SIGNAL on unknown address 0x2020303036303137 (pc 0x7f590a7b8221 bp 0x000000000003 sp 0x7ffc9ba3ed90 T0)
#0 0x7f590a7b8221 in _IO_fwrite (/lib/x86_64-linux-gnu/libc.so.6+0x71221) (BuildId: d13860e173e2fb2370e980b3f6995bddb363c194)
#1 0x560f75e0e419 in fwrite (/cloudide/workspace/temp/reduced+0x4d419) (BuildId: 83513aaec8eda26677b4698b88675e660a3d27a8)
#2 0x7f590a31cc90 in std.internal.builtin.print:0[Tuple[str],str,str,Ptr[byte],bool].276 /cloudide/workspace/downloads/codon-clang11/lib/codon/stdlib/internal/builtin.codon:31:35
==558740==Register values:
rax = 0x000000000a319a01 rbx = 0x00007f590a319a70 rcx = 0x0000000000000000 rdx = 0x0000000000000003
rdi = 0x00007f590a319a70 rsi = 0x0000000000000001 rbp = 0x0000000000000003 rsp = 0x00007ffc9ba3ed90
r8 = 0x00007f59079a8340 r9 = 0x000000000000003f r10 = 0x00007f59073d065b r11 = 0x0000560f75e0660e
r12 = 0x0000000000000003 r13 = 0x000000000a319a01 r14 = 0x0000000000000001 r15 = 0x0000000000000000
AddressSanitizer can not provide additional info.
And that's exactly why I tried to find where something global were initialized and I found main so I tried to call it manually and it seems worked π(except the leaks).
@arshajii OK I think I've just found where omp functions were called unexpectedly π (Python code was same as above):
(gdb) bt
#0 0x00005555556201ae in malloc ()
#1 0x00007ffff7fe88e1 in _dl_exception_create_format () from /lib64/ld-linux-x86-64.so.2
#2 0x00007ffff7fdcebc in ?? () from /lib64/ld-linux-x86-64.so.2
#3 0x00007ffff7bab53b in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#4 0x00007ffff7c813b4 in ?? () from /lib/x86_64-linux-gnu/libdl.so.2
#5 0x00007ffff7babb40 in _dl_catch_exception () from /lib/x86_64-linux-gnu/libc.so.6
#6 0x00007ffff7babbff in _dl_catch_error () from /lib/x86_64-linux-gnu/libc.so.6
#7 0x00007ffff7c81a65 in ?? () from /lib/x86_64-linux-gnu/libdl.so.2
#8 0x00007ffff7c8141c in dlsym () from /lib/x86_64-linux-gnu/libdl.so.2
#9 0x00007ffff72e905b in ompt_start_tool (omp_version=201611, runtime_version=0x7ffff723a936 <__kmp_version_lib_ver+6> "LLVM OMP version: 5.0.20140926") at /opt/tiger/workspace/codon/build/_deps/openmp-src/runtime/src/ompt-general.cpp:165
#10 0x00007ffff72e9193 in ompt_try_start_tool (omp_version=201611, runtime_version=<optimized out>) at /opt/tiger/workspace/codon/build/_deps/openmp-src/runtime/src/ompt-general.cpp:262
#11 ompt_pre_init () at /opt/tiger/workspace/codon/build/_deps/openmp-src/runtime/src/ompt-general.cpp:442
#12 0x00007ffff726c67d in __kmp_do_serial_initialize () at /opt/tiger/workspace/codon/build/_deps/openmp-src/runtime/src/kmp_runtime.cpp:6950
#13 0x00007ffff4275aea in seq_init (flags=5) at /opt/tiger/workspace/codon/codon/runtime/lib.cpp:60
#14 0x00007ffff762743b in main () from ./sized_rang_tagged_pointer_8_32.codon.so
#15 0x00005555556648ce in main () at /cloudide/workspace/temp/reduced.cpp:149
Hi @arshajii,
I think I may have found the root cause of the memory leaks. The issue appears to be in the seq_init() function in lib.cpp, specifically the call to __kmpc_set_gc_callbacks.
Even in a single-threaded context, this function causes libomp.so to be loaded and seems to be responsible for the reported memory leaks. When I removed the call to __kmpc_set_gc_callbacks, the LSAN reports no more leaks and the program works as expected.
I suspect there might be some memory management issues within this callback setup. Perhaps a better approach would be to only call __kmpc_set_gc_callbacks when actual multi-threading is enabled(@par cases), rather than unconditionally calling it in seq_init().
Let me know if you'd like me to provide more details or if you need any additional information to investigate this further.
Hi @MetalOxideSemi sorry for the delay. Could you possibly try this with Codon 0.19 / latest release? We upgraded OpenMP in this release. My guess is that OpenMP allocates some memory internally that it never frees, and that this is not actually a real memory leak.
The secondary issue is the main function not running when you dlopen() the library... This should happen automatically without you have to call it yourself. What OS/system are you using?
Hi @MetalOxideSemi sorry for the delay. Could you possibly try this with Codon 0.19 / latest release? We upgraded OpenMP in this release. My guess is that OpenMP allocates some memory internally that it never frees, and that this is not actually a real memory leak.
The secondary issue is the
mainfunction not running when youdlopen()the library... This should happen automatically without you have to call it yourself. What OS/system are you using?
Hi @arshajii Thank you for your previous response. I apologize for the delayed reply.
Regarding the memory leak issue, I've upgraded to Codon v0.19.1 and built it from source to test the fix. However, I'm encountering a linking problem: I can't find the __kmpc_set_gc_callbacks symbol which was previously defined in exaloop/openmp. I notice that the current version uses libomp.so from LLVM-20, but I'm not sure how to properly link against this function.
/usr/bin/ld: /home/metaloxide/doc/codon/build/libcodonrt.so: undefined reference to `__kmpc_set_gc_callbacks'
clang++: error: linker command failed with exit code 1 (use -v to see invocation)
Could you please provide some guidance on how to correctly link against this function in the new version? This would help me proceed with testing the memory leak fix.
Thank you for your help!
Regarding the initialization question: I'm on Debian Bullseye, but I suspect the issue might be more related to my build process rather than the platform. I'm currently compiling to .o first and then manually linking to .so:
./codon-dist/bin/codon build --relocation-model=pic --obj --data-sections --function-sections -o /home/metaloxide/doc/unittests/foo.codon.o /home/metaloxide/doc/unittests/foo.codon.py
ld.lld /home/metaloxide/doc/unittests/foo.codon.o -shared -L./codon-dist/lib -rpath ./codon-dist/lib -L./codon-dist/lib/codon -rpath ./codon-dist/lib/codon -lcodonrt -o /home/metaloxide/doc/unittests/foo.codon.so
I've examined the shared library with objdump and readelf, and found no conventional entry point mechanisms:
readelf -x .init ./foo.codon.so
# Warning: Section '.init' was not dumped because it does not exist!
readelf -x .init_array ./foo.codon.so
# Warning: Section '.init_array' was not dumped because it does not exist!
readelf -x .ctors ./foo.codon.so
# Warning: Section '.ctors' was not dumped because it does not exist!
Could you explain how main() is intended to be set up as the entry point for the shared library? This would help me identify potential issues in my compilation pipeline.
Thank you again for your assistance!
Hi @MetalOxideSemi -
-
Regarding the OpenMP runtime function -- we're now using the OpenMP that's included in the LLVM project, which you can build when building LLVM via the
-DLLVM_ENABLE_PROJECTSCMake flag (seedeps.shfor the full command(s)). -
For the initialization, yes I think you're probably right. If I run
readelf -x .initon a library generated withcodon build -libit is in fact there. I see we don't actually emit the global constructor if compiling to an object file; we should have at least a flag to do that -- I'll add this to the next version.
Hi @MetalOxideSemi -
- Regarding the OpenMP runtime function -- we're now using the OpenMP that's included in the LLVM project, which you can build when building LLVM via the
-DLLVM_ENABLE_PROJECTSCMake flag (seedeps.shfor the full command(s)).- For the initialization, yes I think you're probably right. If I run
readelf -x .initon a library generated withcodon build -libit is in fact there. I see we don't actually emit the global constructor if compiling to an object file; we should have at least a flag to do that -- I'll add this to the next version.
Seems that __kmpc_set_gc_callbacks is still within code, which is not a part of LLVM's openmp, See here. I've encountered linker error with itπ, so how can I solve it?
Are you using Codon's LLVM fork (https://github.com/exaloop/llvm-project)? This includes that function.
Also, on another note, I've added a flag to include a global constructor when building an object file in https://github.com/exaloop/codon/commit/f97bc20242a795c7b054b719053610dd052ad3db -- you can use -global-ctor=yes to do so. This will be in the release that'll come out in the next day or so.