oneTBB icon indicating copy to clipboard operation
oneTBB copied to clipboard

Feature Request: Add Cygwin support

Open zmajeed opened this issue 6 years ago • 40 comments

This issue is to track support for Cygwin - the Linux environment on Windows

I've added makefiles support to for a cygwin build that I'd like to send as a PR - should it be for the tbb_2019 branch or would you like to create a new tbb_2019_cygwin branch?

My changes are at https://github.com/intel/tbb/compare/tbb_2019...zmajeed:cygwin_build

Many tests still fail - I hope others will be able to contribute fixes to get the cygwin build fully working

zmajeed avatar Jun 22 '19 15:06 zmajeed

@zmajeed oneTBB now uses CMake. If this still applicable, please update your patch and open a pull request.

anton-potapov avatar Oct 05 '21 20:10 anton-potapov

My fork https://github.com/zmajeedforks/oneTBB/tree/cygwin_build is based on latest master - it has changes for building on cygwin - https://github.com/oneapi-src/oneTBB/compare/master...zmajeedforks:cygwin_build - I'll give you a PR when tests pass - and update my progress here

zmajeed avatar Oct 07 '21 17:10 zmajeed

First I got these cmake errors

-- HWLOC target HWLOC::hwloc_1_11 doesn't exist. The tbbbind target cannot be created
-- HWLOC target HWLOC::hwloc_2 doesn't exist. The tbbbind_2_0 target cannot be created
-- HWLOC target HWLOC::hwloc_2_5 doesn't exist. The tbbbind_2_5 target cannot be created

I didn't know what tbbbind or hwloc is - found it's related to MPI which I'm not interested in though seems supported on cygwin - could look into later - it seems optional and unnecessary to see something that feels like nothing might build

I decided to ignore it and proceeded with building

zmajeed avatar Oct 07 '21 18:10 zmajeed

Next

src/tbb/dynamic_link.cpp: In function ‘void tbb::detail::r1::init_ap_data()’:
src/tbb/dynamic_link.cpp:250:9: error: ‘Dl_info’ was not declared in this scope
  250 |         Dl_info dlinfo;

Turns out Dl_info and dladdr have custom implementations on cygwin, need #define __GNU_VISIBLE 1 to get them

zmajeed avatar Oct 07 '21 18:10 zmajeed

Very happy to see -Wall -Wextra -Werror - I remember opening an issue for some warnings a while back! Not a fan of -Wfatal-errors though

zmajeed avatar Oct 07 '21 18:10 zmajeed

btw I'm building with gcc 11.2 and no problems so far

zmajeed avatar Oct 07 '21 18:10 zmajeed

then - need -D_POSIX_C_SOURCE=200809L for strdup

zmajeed avatar Oct 07 '21 18:10 zmajeed

then - rml_tbb.cpp has

#pragma weak __TBB_make_rml_server

but weak symbols have some subtleties on cygwin - I don't yet know why weak symbols are needed at all - so disabled them in _config.h

#define __TBB_WEAK_SYMBOLS_PRESENT ( !_WIN32 && !__APPLE__ && !__sun && !__CYGWIN__ && (__TBB_GCC_VERSION >= 40000 || __INTEL_COMPILER ) )

zmajeed avatar Oct 07 '21 18:10 zmajeed

next - reusing the linux proxy def file for cygwin - in GNU.cmake

set(TBB_DEF_FILE_PREFIX lin${TBB_ARCH})

zmajeed avatar Oct 07 '21 18:10 zmajeed

cygwin has no RLIMIT_NPROC for getrlimit, setrlimit

exclude cygwin for the same reason as windows

// On Windows there is no real thread number limit beside available memory.
// Therefore, the test for thread limit is unreasonable.
#if TBB_USE_EXCEPTIONS && !_WIN32 && !__ANDROID__ && !__CYGWIN__

zmajeed avatar Oct 07 '21 18:10 zmajeed

finally - a number of test files have assembler errors - either too many sections or string table overflow resulting in file too big

this is due to heavy use of templates - arguably this is a gcc issue but simplifying templates could be worthwhile

only -O3 optimization seems to prevent these errors in all cases, -Wa,-mbig-obj and -Os don't always work

the problem with these options is they increase compile times for files that don't need them - so I did not make -O3 default for all tests

instead the option is added to only those files that broke in test/CmakeLists.txt

# these files need -O3 to fix assembler errors for too many sections, file too big
if (CYGWIN)
  set_source_files_properties(
    tbb/test_concurrent_unordered_map.cpp
    conformance/conformance_concurrent_unordered_map.cpp
...
    PROPERTIES COMPILE_OPTIONS -O3
  )
endif()

zmajeed avatar Oct 07 '21 18:10 zmajeed

so build works on cygwin - have not run tests yet

zmajeed avatar Oct 07 '21 18:10 zmajeed

5 tests failed for first build - all due to dynamic linking - cygwin has dlls not shared objects - I hoped cmake would just do the right thing but looks like it needs the tweaking I did for the older makefiles build

96% tests passed, 5 tests failed out of 133
Total Test time (real) = 614.80 sec
The following tests FAILED:
          4 - test_dynamic_link (Failed)
         51 - test_overwrite_node (BAD_COMMAND)
        125 - test_malloc_compliance (Failed)
        126 - test_malloc_used_by_lib (Subprocess aborted)
        127 - test_malloc_lib_unload (Failed)

Errors

Can't load libtbbmalloc_debug.so.2 or libtbbmalloc.so.2
test/tbbmalloc/test_malloc_used_by_lib.cpp:98, assertion lib: Can't load lib_test_malloc_used_by_lib_debug.so
The executable doesn't export its symbols. Is the -rdynamic switch set during linking?

zmajeed avatar Oct 08 '21 05:10 zmajeed

5 tests failed for first build - all due to dynamic linking - cygwin has dlls not shared objects

I think you need to update tests (perhaps, the library as well) that they use dll suffix under cygwin.

alex-katranov avatar Oct 12 '21 04:10 alex-katranov

finally - a number of test files have assembler errors - either too many sections or string table overflow resulting in file too big

Can you please specify which tests cause the issue?

alex-katranov avatar Oct 12 '21 04:10 alex-katranov

My original changes to the old makefiles build included cygwin dll support - I'll make similar changes for the cmake build

The note above about assembler errors includes a snippet from test/CMakeLists. txt that lists the offending files

zmajeed avatar Oct 12 '21 10:10 zmajeed

I've gone through the 5 test failures - I'll note what's happening with each below - couple of these are hairy and need a fair amount of debugging to sort out

zmajeed avatar Nov 02 '21 12:11 zmajeed

Here's how to run all tests

$ ctest --test-dir build/cyg_gcc_debug --output-on-failure

and run a single test, test_dynamic_link

$ ctest --test-dir build/cyg_gcc_debug -V -R dynamic_link                 

and build a single test

$ make -C build/cyg_gcc_debug/test test_dynamic_link                

zmajeed avatar Nov 02 '21 12:11 zmajeed

The easiest was 51 - test_overwrite_node (BAD_COMMAND) - turned out to be a transient build issue that did not set the execute bits on the test executable - the test passed on a rebuild

$ ctest --test-dir build/cyg_gcc_debug -V -R overwrite_node               
51 - test_overwrite_node (BAD_COMMAND)
Test project build/cyg_gcc_debug
    Start  51: test_overwrite_node
Process not started
build/cyg_gcc_debug/gnu_11.2_cxx11_64_debug/test_overwrite_node.exe
[permission denied]
1/2 Test  #51: test_overwrite_node ..............***Not Run   0.00 sec
    Start 106: conformance_overwrite_node
2/2 Test #106: conformance_overwrite_node .......   Passed    0.33 sec

zmajeed avatar Nov 02 '21 12:11 zmajeed

Next is 4 - test_dynamic_link (Failed) - caused by symbols not being exported from the test executable - fixed with dllexport attribute on the symbols in test/tbb/test_dynamic_link.cpp

$ ctest --test-dir build/cyg_gcc_debug -V -R dynamic_link
test 4
    Start 4: test_dynamic_link
4: Test command: build/cyg_gcc_debug/gnu_11.2_cxx11_64_debug/test_dynamic_link.exe "--force-colors=1"
4: test/tbb/test_dynamic_link.cpp:94:
4: TEST CASE:  Test dynamic_link with non-existing library
4:
4: test/common/utils_dynamic_libs.h:116: FATAL ERROR: REQUIRE( func ) is NOT correct!
4:   values: REQUIRE( NULL )
4:   logged: The executable doesn't export its symbols. Is the -rdynamic switch set during linking?
4:           Can't find required symbol in dynamic library
4:
4: test/tbb/test_dynamic_link.cpp:72: FATAL ERROR: REQUIRE( (utils::GetAddress(utils::OpenLibrary(nullptr), "foo1") && utils::GetAddress(utils::OpenLibrary(nullptr), "foo2")) ) THREW exception: "unknown exception"
4:   logged: The executable doesn't export its symbols. Is the -rdynamic switch set during linking?
4: test/tbb/test_dynamic_link.cpp:100:
4: TEST CASE:  Test dynamic_link
4:
4: test/common/utils_dynamic_libs.h:116: FATAL ERROR: REQUIRE( func ) is NOT correct!
4:   values: REQUIRE( NULL )
4:   logged: The executable doesn't export its symbols. Is the -rdynamic switch set during linking?
4:           Can't find required symbol in dynamic library
4:
4: test/tbb/test_dynamic_link.cpp:72: FATAL ERROR: REQUIRE( (utils::GetAddress(utils::OpenLibrary(nullptr), "foo1") && utils::GetAddress(utils::OpenLibrary(nullptr), "foo2")) ) THREW exception: "unknown exception"
4:   logged: The executable doesn't export its symbols. Is the -rdynamic switch set during linking?
4: [doctest] test cases: 2 | 0 passed | 2 failed | 0 skipped
4: [doctest] assertions: 4 | 0 passed | 4 failed |
4: [doctest] Status: FAILURE!
1/1 Test #4: test_dynamic_link ................***Failed    0.48 sec
0% tests passed, 1 tests failed out of 1
Total Test time (real) =   0.84 sec
The following tests FAILED:
          4 - test_dynamic_link (Failed)
Errors while running CTest

Make the change

diff --git a/test/tbb/test_dynamic_link.cpp b/test/tbb/test_dynamic_link.cpp
index 2856db37..684bb435 100644
--- a/test/tbb/test_dynamic_link.cpp
+++ b/test/tbb/test_dynamic_link.cpp
@@ -30,6 +30,8 @@ enum FOO_TYPE {

 #if _WIN32 || _WIN64
 #define TEST_EXPORT
+#elif __CYGWIN__
+  #define TEST_EXPORT extern "C"  __attribute__((dllexport))
 #else
 #define TEST_EXPORT extern "C"
 #endif /* _WIN32 || _WIN64 */

Rebuild and rerun test

$ ctest --test-dir build/cyg_gcc_debug -V -R dynamic_link
test 4
    Start 4: test_dynamic_link
4: [doctest] test cases: 2 | 2 passed | 0 failed | 0 skipped
4: [doctest] assertions: 9 | 9 passed | 0 failed |
4: [doctest] Status: SUCCESS!
1/1 Test #4: test_dynamic_link ................   Passed    0.44 sec

The following tests passed:
        test_dynamic_link

100% tests passed, 0 tests failed out of 1

zmajeed avatar Nov 02 '21 13:11 zmajeed

The third test to fail is 125 - test_malloc_compliance (Failed) - because cygwin does not support RLIMIT_AS for address space or memory limits and getrlimit returns EINVAL errno 22

Luckily cygwin can use the Windows version of the test - this works!

But it is not straightforward to get cygwin to use Windows API the way TBB tests are currently structured

The first big hurdle is the plethora of platform feature test macros in the common header files for the tests.

For cygwin to use Windows API, it has to #include <windows.h>. This is particularly difficult to manage in a surgical manner in a shared header file because cygwin by default needs conditional compiles for Linux and Unix

A second problem is the common functions used in tests are all implemented with various platform-specific blocks in header files. It would be nice if common functions were in a separate library with implementations separated from the interfaces. Since this is not the case, refactoring is trickier.

The third problem is the shared headers have hard dependencies on the doctest unit test framework used by TBB. doctest is a single header library. This means it cannot be built into a separate library and used by a test executable at the same time because of multiple definition errors for doctest symbols when linking

Again it would be nice if shared functions that are not actually tests themselves did not depend on doctest - and the only dependencies were in the unit test .cpp files

My temporary fix to get the test to pass is to place a copy of the shared function in test_malloc_compliance.cpp just for cygwin. I'll see if a better solution is possible for the PR.

Here's the original failure

$ ctest --test-dir build/cyg_gcc_debug -V -R malloc_compliance
test 125
    Start 125: test_malloc_compliance

125: Test command: build/cyg_gcc_debug/gnu_11.2_cxx11_64_debug/test_malloc_compliance.exe "--force-colors=1"
125: Test timeout computed to be: 10000000
125: Can't set limits: errno 22
1/1 Test #125: test_malloc_compliance ...........***Failed    0.45 sec

0% tests passed, 1 tests failed out of 1
Total Test time (real) =   1.38 sec
The following tests FAILED:
        125 - test_malloc_compliance (Failed)

The test passes after moving some code around in test_malloc_compliance.cpp to prevent #include <windows.h> from bleeding into non-Windows code blocks - and using a copy of the Windows version of the shared GetMemoryUsage function from test/common/memory_usage.h

It's very slow though which could be cygwin overhead of using Win32

$ ctest --test-dir build/cyg_gcc_debug -V -R malloc_compliance
test 125
    Start 125: test_malloc_compliance

125: Test command: cyg_gcc_debug/gnu_11.2_cxx11_64_debug/test_malloc_compliance.exe "--force-colors=1"
125: Test timeout computed to be: 10000000
125: ===============================================================================
125: [doctest] test cases:       1 |       1 passed | 0 failed | 0 skipped
125: [doctest] assertions: 1511136 | 1511136 passed | 0 failed |
125: [doctest] Status: SUCCESS!
1/1 Test #125: test_malloc_compliance ...........   Passed   34.76 sec

The following tests passed:
        test_malloc_compliance
100% tests passed, 0 tests failed out of 1
Total Test time (real) =  35.36 sec

zmajeed avatar Nov 04 '21 12:11 zmajeed

The last two tests to fail 126 - test_malloc_used_by_lib (Subprocess aborted) and 127 - test_malloc_lib_unload (Failed) are particularly difficult to debug and fix because they seem to involve race conditions with C++ static data, symbols from dynamically loaded libraries using dlopen and dlsym and multiple threads in pthreads.

Both tests crash with segv faults that trash the stack so pinpointing the culprit is tedious

The segv in both tests is from a dynamically loaded function from a dll in one thread that is accessed after the library has been unloaded with dlclose. Test 126 - test_malloc_used_by_lib (Subprocess aborted) has a custom barrier with 4 threads and 127 - test_malloc_lib_unload (Failed) has just one thread - of course these threads are in addition to the main thread for the test framework - there are always multiple threads involved. On the face of it test source code looks fine - the segv is in internal standard library destructor or cleanup code - it could be an instruction reordering issue that somehow does not appear in Linux

I'm trying to reproduce the failures in small standalone programs to see if the bugs are in TBB tests or in the doctest framework or in cygwin

zmajeed avatar Nov 04 '21 12:11 zmajeed

The description above for the fifth test failure 127 - test_malloc_lib_unload (Failed) is not right - the crash may have been from some of my experimental changes

The actual failure is two-fold - fixed by using a right library name and version - and by using the Win32 version of a utility function

Initially the test fails because of a shared library/dll name and version mismatch. The top cmakefile has set(TBBMALLOC_BINARY_VERSION 2) - this builds cygtbbmalloc_debug-2.dll on cygwin. But test_malloc_lib_unload tries to dlopen libtbbmalloc_debug.so.2 or libtbbmalloc.so.2

$ ctest --test-dir build/cyg_gcc_debug -V -R malloc_lib_unload
test 127
    Start 127: test_malloc_lib_unload

127: Test command: cyg_gcc_debug/gnu_11.2_cxx11_64_debug/test_malloc_lib_unload.exe "--force-colors=1"
127: Can't load libtbbmalloc_debug.so.2 or libtbbmalloc.so.2
1/1 Test #127: test_malloc_lib_unload ...........***Failed    0.45 sec

0% tests passed, 1 tests failed out of 1
Total Test time (real) =   1.28 sec
The following tests FAILED:
        127 - test_malloc_lib_unload (Failed)

This is fixed by correctly setting various macros for cygwin in test/common/utils_dynamic_libs.h

diff --git a/test/common/utils_dynamic_libs.h b/test/common/utils_dynamic_libs.h
index 22927fe1..a503e724 100644
--- a/test/common/utils_dynamic_libs.h
+++ b/test/common/utils_dynamic_libs.h
@@ -32,7 +32,13 @@ namespace utils {

 #if TBB_USE_DEBUG
 #define SUFFIX1 "_debug"
-#define SUFFIX2
+
+#  if __CYGWIN__
+#    define SUFFIX2 "_debug-2"
+#  else
+#    define SUFFIX2
+#  endif
+
 #else
 #define SUFFIX1
 #define SUFFIX2 "_debug"
@@ -45,19 +51,24 @@ namespace utils {
     #define PREFIX
 #endif
 #define EXT ".dll"
+#elif __CYGWIN__
+#  define PREFIX "cyg"
 #else
 #define PREFIX "lib"
+#endif
+
 #if __APPLE__
 #define EXT ".dylib"
 // Android SDK build system does not support .so file name versioning
 #elif __FreeBSD__ || __NetBSD__ || __sun || _AIX || __ANDROID__
 #define EXT ".so"
+#elif __CYGWIN__
+#  define EXT ".dll"
 #elif __unix__  // Order of these elif's matters!
 #define EXT __TBB_STRING(.so.2)
 #else
 #error Unknown OS
 #endif
-#endif

 // Form the names of the TBB memory allocator binaries.
 #define MALLOCLIB_NAME1 PREFIX "tbbmalloc" SUFFIX1 EXT
@@ -71,6 +82,8 @@ using LIBRARY_HANDLE = void*;

 #if _WIN32 || _WIN64
 #define TEST_LIBRARY_NAME(base) PREFIX base SUFFIX1 ".dll"
+#elif __CYGWIN__
+#  define TEST_LIBRARY_NAME(base) PREFIX base SUFFIX1 ".dll"
 #elif __APPLE__
 #define TEST_LIBRARY_NAME(base) PREFIX base SUFFIX1 ".dylib"
 #else

With the correct library loading, a test assertion fails

$ ctest --test-dir build/cyg_gcc_debug -V -R malloc_lib_unload
test 127
    Start 127: test_malloc_lib_unload

127: Test command: cyg_gcc_debug/gnu_11.2_cxx11_64_debug/test_malloc_lib_unload.exe "--force-colors=1"
127: ===============================================================================
127: cygwin_build/test/tbbmalloc/test_malloc_lib_unload.cpp:188:
127: TEST CASE:  test unload lib
127:
127: cygwin_build/test/tbbmalloc/test_malloc_lib_unload.cpp:204: FATAL ERROR: REQUIRE( memory_in_use == memory_check ) is NOT correct!
127:   values: REQUIRE( 11730944 == 11796480 )
127:   logged: Memory consumption should not increase after 1st GetMemoryUsage() call
127:
127: ===============================================================================
127: [doctest] test cases: 1 | 0 passed | 1 failed | 0 skipped
127: [doctest] assertions: 7 | 6 passed | 1 failed |
127: [doctest] Status: FAILURE!
1/1 Test #127: test_malloc_lib_unload ...........***Failed    0.57 sec

0% tests passed, 1 tests failed out of 1
Total Test time (real) =   0.70 sec
The following tests FAILED:
        127 - test_malloc_lib_unload (Failed)

memory_in_use and memory_check are determined with GetMemoryUsage - for Linux (and cygwin by default) this returns the VmSize stat from /proc/self/status

As it happens this same function GetMemoryUsage was also why a previous test, test_malloc_compliance, failed - the reason was cygwin has yet to add support for the VmPeak stat in /proc/self/status - I made test_malloc_compliance pass by making cygwin use the Win32 version of GetMemoryUsage

Back to test_malloc_lib_unload - I did not debug why VmSize returns different values on cygwin - instead I employed the same approach and made cygwin use the Windows version of GetMemoryUsage - this returns the PagefileUsage stat using GetProcessMemoryInfo from psapi.h

Lo and behold the test passes

$ ctest --test-dir build/cyg_gcc_debug -V -R malloc_lib_unload
test 127
    Start 127: test_malloc_lib_unload

127: Test command: cyg_gcc_debug/gnu_11.2_cxx11_64_debug/test_malloc_lib_unload.exe "--force-colors=1"
127: ===============================================================================
127: [doctest] test cases:  1 |  1 passed | 0 failed | 0 skipped
127: [doctest] assertions: 57 | 57 passed | 0 failed |
127: [doctest] Status: SUCCESS!
1/1 Test #127: test_malloc_lib_unload ...........   Passed    0.48 sec

The following tests passed:
        test_malloc_lib_unload
100% tests passed, 0 tests failed out of 1
Total Test time (real) =   0.89 sec

zmajeed avatar Nov 08 '21 15:11 zmajeed

Only one test failure remains - 126 - test_malloc_used_by_lib (Subprocess aborted) - this indeed is a SIGSEGV - I still don't have a full explanation - but my money is on destructors of static C++ objects that are accessing a dynamically loaded function from a dll after it has been unloaded with dlclose in one thread.

I'm thinking a destructor or some kind of cleanup code runs at program exit and not when the library is unloaded. This would be a bug in the test.

zmajeed avatar Nov 08 '21 15:11 zmajeed

Question for TBB maintainers - would TBBBind be required for a Cygwin port PR?

zmajeed avatar Nov 08 '21 17:11 zmajeed

Sync to upstream master added interprocedural optimization options by default - introduced in PR https://github.com/oneapi-src/oneTBB/pull/432

These gcc options -flto -fno-fat-lto-objects break almost all tests on cygwin

1: Cygwin runtime failure: cyg_gcc_debug/gnu_11.2_cxx11_64_debug/test_tick_count.exe: Invalid relocation.  Offset 0x41aa20cc8 at address 0x100418c6e doesn't fit into 32 bits

Disabling IPO for cygwin in top cmakefile

diff --git a/CMakeLists.txt b/CMakeLists.txt
index 4dbdadb9..63fa3afe 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -185,7 +185,7 @@ endforeach()
 # - Enabling LTO on Android causes the NDK bug.
 #   NDK throws the warning: "argument unused during compilation: '-Wa,--noexecstack'"
 # - For some reason GCC does not instrument code with Thread Sanitizer when lto is enabled and C linker is used.
-if (TBB_ENABLE_IPO AND BUILD_SHARED_LIBS AND NOT ANDROID_PLATFORM AND NOT TBB_SANITIZE MATCHES "thread")
+if (TBB_ENABLE_IPO AND BUILD_SHARED_LIBS AND NOT ANDROID_PLATFORM AND NOT CYGWIN AND NOT TBB_SANITIZE MATCHES "thread")
     if (NOT CMAKE_VERSION VERSION_LESS 3.9)
         cmake_policy(SET CMP0069 NEW)
         include(CheckIPOSupported)

It could be -fno-fat-lto-objects automatically set by cmake is the real problem.

zmajeed avatar Nov 09 '21 04:11 zmajeed

Question for TBB maintainers - would TBBBind be required for a Cygwin port PR?

We do not use Cygwin :) so it is up to you. TBBBind is responsible for NUMA support in task_arena that will work without TBBBind considering any multi-socket machine as the machine with one big socket.

alex-katranov avatar Nov 09 '21 15:11 alex-katranov

That's great - now I just need to fix this last test failure and come up with a nicer way for cygwin to use Win32 code in the tests

zmajeed avatar Nov 09 '21 16:11 zmajeed

Only one test failure remains - 126 - test_malloc_used_by_lib (Subprocess aborted) - this indeed is a SIGSEGV - I still don't have a full explanation - but my money is on destructors of static C++ objects that are accessing a dynamically loaded function from a dll after it has been unloaded with dlclose in one thread.

Can it be related: https://github.com/oneapi-src/oneTBB/blob/master/test/common/doctest.h#L470-L504?

alex-katranov avatar Nov 11 '21 08:11 alex-katranov

Thank you for the tip - enabling the VC++ version of doctest_thread_local_wrapper for cygwin does not fix the problem - but it did lead me to cygwin wrappers for pthread that seem to use thread-local storage - I'll dig deeper when I get a chance

zmajeed avatar Nov 12 '21 13:11 zmajeed