unified-memory-framework icon indicating copy to clipboard operation
unified-memory-framework copied to clipboard

The proxy library with jemalloc pool can work incorrectly if the app it is loaded for links with libumf.so

Open ldorau opened this issue 1 year ago • 4 comments

The proxy library with jemalloc pool can work incorrectly if the app it is loaded for links dynamically with libumf.so, because both the proxy library and the app use one and the same TRACKER and one and the same global base allocator:

Ref: #226 See: https://github.com/oneapi-src/unified-memory-framework/pull/226#issuecomment-1946312900

Without the proxy library (with debug logs):

$ ./test/umf_test-disjointPool --gtest_filter="*sharedLimits*"
>>> umf_ba_constructor constructor(101)
>>> umfCreate constructor(-) TRACKER = umfMemoryTrackerCreate()
Running main() from unified-memory-framework/build/_deps/googletest-src/googletest/src/gtest_main.cc
Note: Google Test filter = *sharedLimits*
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from test
[ RUN      ] test.sharedLimits
>>> umf_ba_create_global(BA_pool=0x7f6a01a14000)
>>> umf_ba_get_pool(BA_pool=0x7f6a01a14000)
>>> umf_ba_get_pool(BA_pool=0x7f6a01a14000)
>>> umf_ba_get_pool(BA_pool=0x7f6a01a14000)
>>> umf_ba_get_pool(BA_pool=0x7f6a01a14000)
>>> umf_ba_get_pool(BA_pool=0x7f6a01a14000)
[       OK ] test.sharedLimits (0 ms)
[----------] 1 test from test (0 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test suite ran. (0 ms total)
[  PASSED  ] 1 test.
>>> umfDestroy destructor(-) umfMemoryTrackerDestroy() TRACKER = NULL
>>> umf_ba_destructor destructor(101)

With the proxy library (with debug logs):

$ LD_PRELOAD=./lib/libumf_proxy.so ./test/umf_test-disjointPool --gtest_filter="*sharedLimits*"
>>> umf_ba_constructor constructor(101)
>>> umfCreate constructor(-) TRACKER = umfMemoryTrackerCreate()
>>> umf_ba_constructor constructor(101)
>>> proxy_lib_create constructor(102)
>>> proxy_lib_create_common() BEGIN
>>> proxy_lib_create_common() -> umfMemoryProviderCreate()
>>> umf_ba_create_global(BA_pool=0x7f44e4ba9000)
>>> umf_ba_get_pool(BA_pool=0x7f44e4ba9000)
>>> umf_ba_get_pool(BA_pool=0x7f44e4ba9000)
>>> proxy_lib_create_common() -> umfPoolCreate()
>>> umf_ba_get_pool(BA_pool=0x7f44e4ba9000)
>>> umf_ba_get_pool(BA_pool=0x7f44e4ba9000)
>>> je_initialize()
>>> umf_ba_create_global(BA_pool=0x7f44e02d2000)
>>> umf_ba_get_pool(BA_pool=0x7f44e02d2000)
>>> je_initialize(base_allocator=0x7f44e02d2000)
>>> proxy_lib_create_common() END
Running main() from unified-memory-framework/build/_deps/googletest-src/googletest/src/gtest_main.cc
Note: Google Test filter = *sharedLimits*
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from test
[ RUN      ] test.sharedLimits
>>> umf_ba_get_pool(BA_pool=0x7f44e4ba9000)
>>> umf_ba_get_pool(BA_pool=0x7f44e4ba9000)
>>> umf_ba_get_pool(BA_pool=0x7f44e4ba9000)
>>> umf_ba_get_pool(BA_pool=0x7f44e4ba9000)
>>> umf_ba_get_pool(BA_pool=0x7f44e4ba9000)
/home/ldorau/work/unified-memory-framework/test/pools/disjoint_pool.cpp:141: Failure
Expected equality of these values:
  MaxSize / SlabMinSize
    Which is: 4
  numFrees
    Which is: 2
Segmentation fault

ldorau avatar Feb 15 '24 15:02 ldorau

I wonder if this can be fixed if the lib proxy always links a static version of umf?

bratpiorka avatar Feb 15 '24 16:02 bratpiorka

I think this just masks some other problem with proxy library. With dynamic libumf.so, we should only have single tracker and base_alloc instance so it should work fine.

igchor avatar Feb 15 '24 19:02 igchor

It looks like jemalloc might be partially responsible for the failure. I just replaced jemalloc in proxy_lib with proxy_pool (slightly modified to support realloc and calloc) and I don't see segfaults anymore. See this branch: https://github.com/igchor/unified-memory-framework/tree/Add_proxy_library_checks

igchor avatar Feb 15 '24 19:02 igchor

Yes, The proxy library with scalable pool works well. Those issues occur only with jemalloc pool, so there can be an issue with jemalloc pool.

ldorau avatar Feb 16 '24 10:02 ldorau