Feature Request: Add Cygwin support
This issue is to track support for Cygwin - the Linux environment on Windows
I've added makefiles support to for a cygwin build that I'd like to send as a PR - should it be for the tbb_2019 branch or would you like to create a new tbb_2019_cygwin branch?
My changes are at https://github.com/intel/tbb/compare/tbb_2019...zmajeed:cygwin_build
Many tests still fail - I hope others will be able to contribute fixes to get the cygwin build fully working
@zmajeed oneTBB now uses CMake. If this still applicable, please update your patch and open a pull request.
My fork https://github.com/zmajeedforks/oneTBB/tree/cygwin_build is based on latest master - it has changes for building on cygwin - https://github.com/oneapi-src/oneTBB/compare/master...zmajeedforks:cygwin_build - I'll give you a PR when tests pass - and update my progress here
First I got these cmake errors
-- HWLOC target HWLOC::hwloc_1_11 doesn't exist. The tbbbind target cannot be created
-- HWLOC target HWLOC::hwloc_2 doesn't exist. The tbbbind_2_0 target cannot be created
-- HWLOC target HWLOC::hwloc_2_5 doesn't exist. The tbbbind_2_5 target cannot be created
I didn't know what tbbbind or hwloc is - found it's related to MPI which I'm not interested in though seems supported on cygwin - could look into later - it seems optional and unnecessary to see something that feels like nothing might build
I decided to ignore it and proceeded with building
Next
src/tbb/dynamic_link.cpp: In function ‘void tbb::detail::r1::init_ap_data()’:
src/tbb/dynamic_link.cpp:250:9: error: ‘Dl_info’ was not declared in this scope
250 | Dl_info dlinfo;
Turns out Dl_info and dladdr have custom implementations on cygwin, need #define __GNU_VISIBLE 1 to get them
Very happy to see -Wall -Wextra -Werror - I remember opening an issue for some warnings a while back!
Not a fan of -Wfatal-errors though
btw I'm building with gcc 11.2 and no problems so far
then - need -D_POSIX_C_SOURCE=200809L for strdup
then - rml_tbb.cpp has
#pragma weak __TBB_make_rml_server
but weak symbols have some subtleties on cygwin - I don't yet know why weak symbols are needed at all - so disabled them in _config.h
#define __TBB_WEAK_SYMBOLS_PRESENT ( !_WIN32 && !__APPLE__ && !__sun && !__CYGWIN__ && (__TBB_GCC_VERSION >= 40000 || __INTEL_COMPILER ) )
next - reusing the linux proxy def file for cygwin - in GNU.cmake
set(TBB_DEF_FILE_PREFIX lin${TBB_ARCH})
cygwin has no RLIMIT_NPROC for getrlimit, setrlimit
exclude cygwin for the same reason as windows
// On Windows there is no real thread number limit beside available memory.
// Therefore, the test for thread limit is unreasonable.
#if TBB_USE_EXCEPTIONS && !_WIN32 && !__ANDROID__ && !__CYGWIN__
finally - a number of test files have assembler errors - either too many sections or string table overflow resulting in file too big
this is due to heavy use of templates - arguably this is a gcc issue but simplifying templates could be worthwhile
only -O3 optimization seems to prevent these errors in all cases, -Wa,-mbig-obj and -Os don't always work
the problem with these options is they increase compile times for files that don't need them - so I did not make -O3 default for all tests
instead the option is added to only those files that broke in test/CmakeLists.txt
# these files need -O3 to fix assembler errors for too many sections, file too big
if (CYGWIN)
set_source_files_properties(
tbb/test_concurrent_unordered_map.cpp
conformance/conformance_concurrent_unordered_map.cpp
...
PROPERTIES COMPILE_OPTIONS -O3
)
endif()
so build works on cygwin - have not run tests yet
5 tests failed for first build - all due to dynamic linking - cygwin has dlls not shared objects - I hoped cmake would just do the right thing but looks like it needs the tweaking I did for the older makefiles build
96% tests passed, 5 tests failed out of 133
Total Test time (real) = 614.80 sec
The following tests FAILED:
4 - test_dynamic_link (Failed)
51 - test_overwrite_node (BAD_COMMAND)
125 - test_malloc_compliance (Failed)
126 - test_malloc_used_by_lib (Subprocess aborted)
127 - test_malloc_lib_unload (Failed)
Errors
Can't load libtbbmalloc_debug.so.2 or libtbbmalloc.so.2
test/tbbmalloc/test_malloc_used_by_lib.cpp:98, assertion lib: Can't load lib_test_malloc_used_by_lib_debug.so
The executable doesn't export its symbols. Is the -rdynamic switch set during linking?
5 tests failed for first build - all due to dynamic linking - cygwin has dlls not shared objects
I think you need to update tests (perhaps, the library as well) that they use dll suffix under cygwin.
finally - a number of test files have assembler errors - either too many sections or string table overflow resulting in file too big
Can you please specify which tests cause the issue?
My original changes to the old makefiles build included cygwin dll support - I'll make similar changes for the cmake build
The note above about assembler errors includes a snippet from test/CMakeLists. txt that lists the offending files
I've gone through the 5 test failures - I'll note what's happening with each below - couple of these are hairy and need a fair amount of debugging to sort out
Here's how to run all tests
$ ctest --test-dir build/cyg_gcc_debug --output-on-failure
and run a single test, test_dynamic_link
$ ctest --test-dir build/cyg_gcc_debug -V -R dynamic_link
and build a single test
$ make -C build/cyg_gcc_debug/test test_dynamic_link
The easiest was 51 - test_overwrite_node (BAD_COMMAND) - turned out to be a transient build issue that did not set the execute bits on the test executable - the test passed on a rebuild
$ ctest --test-dir build/cyg_gcc_debug -V -R overwrite_node
51 - test_overwrite_node (BAD_COMMAND)
Test project build/cyg_gcc_debug
Start 51: test_overwrite_node
Process not started
build/cyg_gcc_debug/gnu_11.2_cxx11_64_debug/test_overwrite_node.exe
[permission denied]
1/2 Test #51: test_overwrite_node ..............***Not Run 0.00 sec
Start 106: conformance_overwrite_node
2/2 Test #106: conformance_overwrite_node ....... Passed 0.33 sec
Next is 4 - test_dynamic_link (Failed) - caused by symbols not being exported from the test executable - fixed with dllexport attribute on the symbols in test/tbb/test_dynamic_link.cpp
$ ctest --test-dir build/cyg_gcc_debug -V -R dynamic_link
test 4
Start 4: test_dynamic_link
4: Test command: build/cyg_gcc_debug/gnu_11.2_cxx11_64_debug/test_dynamic_link.exe "--force-colors=1"
4: test/tbb/test_dynamic_link.cpp:94:
4: TEST CASE: Test dynamic_link with non-existing library
4:
4: test/common/utils_dynamic_libs.h:116: FATAL ERROR: REQUIRE( func ) is NOT correct!
4: values: REQUIRE( NULL )
4: logged: The executable doesn't export its symbols. Is the -rdynamic switch set during linking?
4: Can't find required symbol in dynamic library
4:
4: test/tbb/test_dynamic_link.cpp:72: FATAL ERROR: REQUIRE( (utils::GetAddress(utils::OpenLibrary(nullptr), "foo1") && utils::GetAddress(utils::OpenLibrary(nullptr), "foo2")) ) THREW exception: "unknown exception"
4: logged: The executable doesn't export its symbols. Is the -rdynamic switch set during linking?
4: test/tbb/test_dynamic_link.cpp:100:
4: TEST CASE: Test dynamic_link
4:
4: test/common/utils_dynamic_libs.h:116: FATAL ERROR: REQUIRE( func ) is NOT correct!
4: values: REQUIRE( NULL )
4: logged: The executable doesn't export its symbols. Is the -rdynamic switch set during linking?
4: Can't find required symbol in dynamic library
4:
4: test/tbb/test_dynamic_link.cpp:72: FATAL ERROR: REQUIRE( (utils::GetAddress(utils::OpenLibrary(nullptr), "foo1") && utils::GetAddress(utils::OpenLibrary(nullptr), "foo2")) ) THREW exception: "unknown exception"
4: logged: The executable doesn't export its symbols. Is the -rdynamic switch set during linking?
4: [doctest] test cases: 2 | 0 passed | 2 failed | 0 skipped
4: [doctest] assertions: 4 | 0 passed | 4 failed |
4: [doctest] Status: FAILURE!
1/1 Test #4: test_dynamic_link ................***Failed 0.48 sec
0% tests passed, 1 tests failed out of 1
Total Test time (real) = 0.84 sec
The following tests FAILED:
4 - test_dynamic_link (Failed)
Errors while running CTest
Make the change
diff --git a/test/tbb/test_dynamic_link.cpp b/test/tbb/test_dynamic_link.cpp
index 2856db37..684bb435 100644
--- a/test/tbb/test_dynamic_link.cpp
+++ b/test/tbb/test_dynamic_link.cpp
@@ -30,6 +30,8 @@ enum FOO_TYPE {
#if _WIN32 || _WIN64
#define TEST_EXPORT
+#elif __CYGWIN__
+ #define TEST_EXPORT extern "C" __attribute__((dllexport))
#else
#define TEST_EXPORT extern "C"
#endif /* _WIN32 || _WIN64 */
Rebuild and rerun test
$ ctest --test-dir build/cyg_gcc_debug -V -R dynamic_link
test 4
Start 4: test_dynamic_link
4: [doctest] test cases: 2 | 2 passed | 0 failed | 0 skipped
4: [doctest] assertions: 9 | 9 passed | 0 failed |
4: [doctest] Status: SUCCESS!
1/1 Test #4: test_dynamic_link ................ Passed 0.44 sec
The following tests passed:
test_dynamic_link
100% tests passed, 0 tests failed out of 1
The third test to fail is 125 - test_malloc_compliance (Failed) - because cygwin does not support RLIMIT_AS for address space or memory limits and getrlimit returns EINVAL errno 22
Luckily cygwin can use the Windows version of the test - this works!
But it is not straightforward to get cygwin to use Windows API the way TBB tests are currently structured
The first big hurdle is the plethora of platform feature test macros in the common header files for the tests.
For cygwin to use Windows API, it has to #include <windows.h>. This is particularly difficult to manage in a surgical manner in a shared header file because cygwin by default needs conditional compiles for Linux and Unix
A second problem is the common functions used in tests are all implemented with various platform-specific blocks in header files. It would be nice if common functions were in a separate library with implementations separated from the interfaces. Since this is not the case, refactoring is trickier.
The third problem is the shared headers have hard dependencies on the doctest unit test framework used by TBB. doctest is a single header library. This means it cannot be built into a separate library and used by a test executable at the same time because of multiple definition errors for doctest symbols when linking
Again it would be nice if shared functions that are not actually tests themselves did not depend on doctest - and the only dependencies were in the unit test .cpp files
My temporary fix to get the test to pass is to place a copy of the shared function in test_malloc_compliance.cpp just for cygwin. I'll see if a better solution is possible for the PR.
Here's the original failure
$ ctest --test-dir build/cyg_gcc_debug -V -R malloc_compliance
test 125
Start 125: test_malloc_compliance
125: Test command: build/cyg_gcc_debug/gnu_11.2_cxx11_64_debug/test_malloc_compliance.exe "--force-colors=1"
125: Test timeout computed to be: 10000000
125: Can't set limits: errno 22
1/1 Test #125: test_malloc_compliance ...........***Failed 0.45 sec
0% tests passed, 1 tests failed out of 1
Total Test time (real) = 1.38 sec
The following tests FAILED:
125 - test_malloc_compliance (Failed)
The test passes after moving some code around in test_malloc_compliance.cpp to prevent #include <windows.h> from bleeding into non-Windows code blocks - and using a copy of the Windows version of the shared GetMemoryUsage function from test/common/memory_usage.h
It's very slow though which could be cygwin overhead of using Win32
$ ctest --test-dir build/cyg_gcc_debug -V -R malloc_compliance
test 125
Start 125: test_malloc_compliance
125: Test command: cyg_gcc_debug/gnu_11.2_cxx11_64_debug/test_malloc_compliance.exe "--force-colors=1"
125: Test timeout computed to be: 10000000
125: ===============================================================================
125: [doctest] test cases: 1 | 1 passed | 0 failed | 0 skipped
125: [doctest] assertions: 1511136 | 1511136 passed | 0 failed |
125: [doctest] Status: SUCCESS!
1/1 Test #125: test_malloc_compliance ........... Passed 34.76 sec
The following tests passed:
test_malloc_compliance
100% tests passed, 0 tests failed out of 1
Total Test time (real) = 35.36 sec
The last two tests to fail 126 - test_malloc_used_by_lib (Subprocess aborted) and 127 - test_malloc_lib_unload (Failed) are particularly difficult to debug and fix because they seem to involve race conditions with C++ static data, symbols from dynamically loaded libraries using dlopen and dlsym and multiple threads in pthreads.
Both tests crash with segv faults that trash the stack so pinpointing the culprit is tedious
The segv in both tests is from a dynamically loaded function from a dll in one thread that is accessed after the library has been unloaded with dlclose. Test 126 - test_malloc_used_by_lib (Subprocess aborted) has a custom barrier with 4 threads and 127 - test_malloc_lib_unload (Failed) has just one thread - of course these threads are in addition to the main thread for the test framework - there are always multiple threads involved. On the face of it test source code looks fine - the segv is in internal standard library destructor or cleanup code - it could be an instruction reordering issue that somehow does not appear in Linux
I'm trying to reproduce the failures in small standalone programs to see if the bugs are in TBB tests or in the doctest framework or in cygwin
The description above for the fifth test failure 127 - test_malloc_lib_unload (Failed) is not right - the crash may have been from some of my experimental changes
The actual failure is two-fold - fixed by using a right library name and version - and by using the Win32 version of a utility function
Initially the test fails because of a shared library/dll name and version mismatch. The top cmakefile has set(TBBMALLOC_BINARY_VERSION 2) - this builds cygtbbmalloc_debug-2.dll on cygwin. But test_malloc_lib_unload tries to dlopen libtbbmalloc_debug.so.2 or libtbbmalloc.so.2
$ ctest --test-dir build/cyg_gcc_debug -V -R malloc_lib_unload
test 127
Start 127: test_malloc_lib_unload
127: Test command: cyg_gcc_debug/gnu_11.2_cxx11_64_debug/test_malloc_lib_unload.exe "--force-colors=1"
127: Can't load libtbbmalloc_debug.so.2 or libtbbmalloc.so.2
1/1 Test #127: test_malloc_lib_unload ...........***Failed 0.45 sec
0% tests passed, 1 tests failed out of 1
Total Test time (real) = 1.28 sec
The following tests FAILED:
127 - test_malloc_lib_unload (Failed)
This is fixed by correctly setting various macros for cygwin in test/common/utils_dynamic_libs.h
diff --git a/test/common/utils_dynamic_libs.h b/test/common/utils_dynamic_libs.h
index 22927fe1..a503e724 100644
--- a/test/common/utils_dynamic_libs.h
+++ b/test/common/utils_dynamic_libs.h
@@ -32,7 +32,13 @@ namespace utils {
#if TBB_USE_DEBUG
#define SUFFIX1 "_debug"
-#define SUFFIX2
+
+# if __CYGWIN__
+# define SUFFIX2 "_debug-2"
+# else
+# define SUFFIX2
+# endif
+
#else
#define SUFFIX1
#define SUFFIX2 "_debug"
@@ -45,19 +51,24 @@ namespace utils {
#define PREFIX
#endif
#define EXT ".dll"
+#elif __CYGWIN__
+# define PREFIX "cyg"
#else
#define PREFIX "lib"
+#endif
+
#if __APPLE__
#define EXT ".dylib"
// Android SDK build system does not support .so file name versioning
#elif __FreeBSD__ || __NetBSD__ || __sun || _AIX || __ANDROID__
#define EXT ".so"
+#elif __CYGWIN__
+# define EXT ".dll"
#elif __unix__ // Order of these elif's matters!
#define EXT __TBB_STRING(.so.2)
#else
#error Unknown OS
#endif
-#endif
// Form the names of the TBB memory allocator binaries.
#define MALLOCLIB_NAME1 PREFIX "tbbmalloc" SUFFIX1 EXT
@@ -71,6 +82,8 @@ using LIBRARY_HANDLE = void*;
#if _WIN32 || _WIN64
#define TEST_LIBRARY_NAME(base) PREFIX base SUFFIX1 ".dll"
+#elif __CYGWIN__
+# define TEST_LIBRARY_NAME(base) PREFIX base SUFFIX1 ".dll"
#elif __APPLE__
#define TEST_LIBRARY_NAME(base) PREFIX base SUFFIX1 ".dylib"
#else
With the correct library loading, a test assertion fails
$ ctest --test-dir build/cyg_gcc_debug -V -R malloc_lib_unload
test 127
Start 127: test_malloc_lib_unload
127: Test command: cyg_gcc_debug/gnu_11.2_cxx11_64_debug/test_malloc_lib_unload.exe "--force-colors=1"
127: ===============================================================================
127: cygwin_build/test/tbbmalloc/test_malloc_lib_unload.cpp:188:
127: TEST CASE: test unload lib
127:
127: cygwin_build/test/tbbmalloc/test_malloc_lib_unload.cpp:204: FATAL ERROR: REQUIRE( memory_in_use == memory_check ) is NOT correct!
127: values: REQUIRE( 11730944 == 11796480 )
127: logged: Memory consumption should not increase after 1st GetMemoryUsage() call
127:
127: ===============================================================================
127: [doctest] test cases: 1 | 0 passed | 1 failed | 0 skipped
127: [doctest] assertions: 7 | 6 passed | 1 failed |
127: [doctest] Status: FAILURE!
1/1 Test #127: test_malloc_lib_unload ...........***Failed 0.57 sec
0% tests passed, 1 tests failed out of 1
Total Test time (real) = 0.70 sec
The following tests FAILED:
127 - test_malloc_lib_unload (Failed)
memory_in_use and memory_check are determined with GetMemoryUsage - for Linux (and cygwin by default) this returns the VmSize stat from /proc/self/status
As it happens this same function GetMemoryUsage was also why a previous test, test_malloc_compliance, failed - the reason was cygwin has yet to add support for the VmPeak stat in /proc/self/status - I made test_malloc_compliance pass by making cygwin use the Win32 version of GetMemoryUsage
Back to test_malloc_lib_unload - I did not debug why VmSize returns different values on cygwin - instead I employed the same approach and made cygwin use the Windows version of GetMemoryUsage - this returns the PagefileUsage stat using GetProcessMemoryInfo from psapi.h
Lo and behold the test passes
$ ctest --test-dir build/cyg_gcc_debug -V -R malloc_lib_unload
test 127
Start 127: test_malloc_lib_unload
127: Test command: cyg_gcc_debug/gnu_11.2_cxx11_64_debug/test_malloc_lib_unload.exe "--force-colors=1"
127: ===============================================================================
127: [doctest] test cases: 1 | 1 passed | 0 failed | 0 skipped
127: [doctest] assertions: 57 | 57 passed | 0 failed |
127: [doctest] Status: SUCCESS!
1/1 Test #127: test_malloc_lib_unload ........... Passed 0.48 sec
The following tests passed:
test_malloc_lib_unload
100% tests passed, 0 tests failed out of 1
Total Test time (real) = 0.89 sec
Only one test failure remains - 126 - test_malloc_used_by_lib (Subprocess aborted) - this indeed is a SIGSEGV - I still don't have a full explanation - but my money is on destructors of static C++ objects that are accessing a dynamically loaded function from a dll after it has been unloaded with dlclose in one thread.
I'm thinking a destructor or some kind of cleanup code runs at program exit and not when the library is unloaded. This would be a bug in the test.
Question for TBB maintainers - would TBBBind be required for a Cygwin port PR?
Sync to upstream master added interprocedural optimization options by default - introduced in PR https://github.com/oneapi-src/oneTBB/pull/432
These gcc options -flto -fno-fat-lto-objects break almost all tests on cygwin
1: Cygwin runtime failure: cyg_gcc_debug/gnu_11.2_cxx11_64_debug/test_tick_count.exe: Invalid relocation. Offset 0x41aa20cc8 at address 0x100418c6e doesn't fit into 32 bits
Disabling IPO for cygwin in top cmakefile
diff --git a/CMakeLists.txt b/CMakeLists.txt
index 4dbdadb9..63fa3afe 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -185,7 +185,7 @@ endforeach()
# - Enabling LTO on Android causes the NDK bug.
# NDK throws the warning: "argument unused during compilation: '-Wa,--noexecstack'"
# - For some reason GCC does not instrument code with Thread Sanitizer when lto is enabled and C linker is used.
-if (TBB_ENABLE_IPO AND BUILD_SHARED_LIBS AND NOT ANDROID_PLATFORM AND NOT TBB_SANITIZE MATCHES "thread")
+if (TBB_ENABLE_IPO AND BUILD_SHARED_LIBS AND NOT ANDROID_PLATFORM AND NOT CYGWIN AND NOT TBB_SANITIZE MATCHES "thread")
if (NOT CMAKE_VERSION VERSION_LESS 3.9)
cmake_policy(SET CMP0069 NEW)
include(CheckIPOSupported)
It could be -fno-fat-lto-objects automatically set by cmake is the real problem.
Question for TBB maintainers - would TBBBind be required for a Cygwin port PR?
We do not use Cygwin :) so it is up to you. TBBBind is responsible for NUMA support in task_arena that will work without TBBBind considering any multi-socket machine as the machine with one big socket.
That's great - now I just need to fix this last test failure and come up with a nicer way for cygwin to use Win32 code in the tests
Only one test failure remains - 126 - test_malloc_used_by_lib (Subprocess aborted) - this indeed is a SIGSEGV - I still don't have a full explanation - but my money is on destructors of static C++ objects that are accessing a dynamically loaded function from a dll after it has been unloaded with dlclose in one thread.
Can it be related: https://github.com/oneapi-src/oneTBB/blob/master/test/common/doctest.h#L470-L504?
Thank you for the tip - enabling the VC++ version of doctest_thread_local_wrapper for cygwin does not fix the problem - but it did lead me to cygwin wrappers for pthread that seem to use thread-local storage - I'll dig deeper when I get a chance