sleef
sleef copied to clipboard
add static build config.
- Add "SLEEF_BUILD_SHARED_LIBS" to instead of CMake reserved variable "BUILD_SHARED_LIBS".
- Add clear library type to add_library, it will remove the global control from "BUILD_SHARED_LIBS", Link: https://cmake.org/cmake/help/latest/guide/tutorial/Selecting%20Static%20or%20Shared%20Libraries.html
Additional, I write a test project to test this PR: https://github.com/xuhancn/sleef_cmake_example Turn on/off this line to control the build type. https://github.com/xuhancn/sleef_cmake_example/blob/main/CMakeLists.txt#L14
Static:
Shared:
@blapie Please comment and review.
It seems that, we need this PR to fix pytorch sleef build issue. https://github.com/pytorch/pytorch/pull/118980
Hello! I think I understand the issue/suggested solution, although it did not sound extremely clear.
I don't really understand how this works as is. While the change to CMakeLists.txt seems to make sense, in Configure.cmake and src/common/CMakeLists.txt occurrences of BUILD_SHARED_LBS are replaced by SLEEF_STATIC_LIBS, which I believe is the opposite of what you want. Btw SLEEF_BUILD_STATIC_LIBS is not defined as a user-facing option, which doesn't help understanding the solution.
I'm not very keen on that change anyway, even if it was fixed. The use case is too particular to justify making the build system more complex than it already is.
If pytorch needs to setup its submodules differently locally then the solution implementing Save/Restore (https://github.com/pytorch/pytorch/blob/cee16353db384dd33a4804f9f26829cacfcfbab7/aten/src/ATen/CMakeLists.txt#L424-L440) was probably the way to go, or something along those lines.
I don't believe it is a relevant feature to add to SLEEF, but happy to be convinced otherwise.
@blapie Haha, I did a lot of experiment and get a final code and conclusion, please forget previous discussions. Conclusion: We can't use "BUILD_SHARED_LIBS" to control sleef static/shared build. As you mentioned to me it is CMake reserved variable: https://cmake.org/cmake/help/latest/variable/BUILD_SHARED_LIBS.html
Let me analysis the detailed issue:
Phenomenon:
pytorch Windows can only build it pass first time by save/restoe "BUILD_SHARED_LIBS", the second time(incremental build) will cause the runtime libraries conflict: https://github.com/shibatch/sleef/pull/509#issue-2117085191
Reason:
- BUILD_SHARED_LIBS is cmake reserved variable, and it is global scope shared in pytorch and all its submodules.
- When the first time build. CMake will config the whole project and its submodule in constant order. It is sequential processing. save/restore is works in sequential processing.
- The second time build is incremental build, only some modified code/cmake file/submodules will be built. And as we known, CMake is parallel processing build system. Pytorch has a lot of submodules. And modify(save/restore) "BUILD_SHARED_LIBS" will make a unexpectable status and impact on all submodules. It would make the build system chaos.
We need to drop the reserved variable "BUILD_SHARED_LIBS" and use sleef dedicate variable "SLEEF_BUILD_SHARED_LIBS".
Also, you can check the another static lib mimalloc: https://github.com/pytorch/pytorch/blob/main/CMakeLists.txt#L1060-L1068 It is very simple code:
Setup mimalloc dedicate options. Add mimalloc as submodule. Use setup include and link as depends.
Acturlly, save/restore is a workaround for sleef exist "BUILD_SHARED_LIBS" issue. Please check this comment: https://github.com/pytorch/pytorch/blob/cee16353db384dd33a4804f9f26829cacfcfbab7/aten/src/ATen/CMakeLists.txt#L483
Finally, it is continued to rename "BUILD_SHARED_LIBS" to "SLEEF_BUILD_SHARED_LIBS", please check my last PR: https://github.com/shibatch/sleef/pull/509#issuecomment-1929181290 Also, this PR make a rule to enable "SLEEF_BUILD_SHARED_LIBS" to control sleef static/shared library.
@blapie Please review it again.
@blapie could you please priority to review this PR? https://github.com/pytorch/pytorch/pull/118980 this pytorch PR id depends on it.
Hello @xuhancn! I'm looking into now. I would prefer a slightly different approach though but I will look at how feasible that is.
Right, let me know if I'm missing something but I believe we don't have to rename the variable to get the behaviour you are expecting.
Could you try defining the BUILD_SHARED_LIBS option in SLEEF's CMakeLists.txt see comment in review? According to the documentation of CMake it should work, when top-level project switches the option.
https://cmake.org/cmake/help/latest/variable/BUILD_SHARED_LIBS.html
Note that if bringing external dependencies directly into the build, such as with [FetchContent](https://cmake.org/cmake/help/latest/module/FetchContent.html#module:FetchContent) or a direct call to [add_subdirectory()](https://cmake.org/cmake/help/latest/command/add_subdirectory.html#command:add_subdirectory), and one of those dependencies has such a call to [option(BUILD_SHARED_LIBS ...)](https://cmake.org/cmake/help/latest/command/option.html#command:option), the top level project must also call [option(BUILD_SHARED_LIBS ...)](https://cmake.org/cmake/help/latest/command/option.html#command:option) before bringing in its dependencies. Failure to do so can lead to different behavior between the first and subsequent CMake runs.
https://discourse.cmake.org/t/build-shared-libs/4200
It’s fine to declare it as an option if you want to change the default. As long as the minimum required CMake version is 3.13, a parent project can override it with a normal variable.
But it might need to be done the right way. In order to fix the parallel build I think you need to put the save/restore stages in a function (since it will have a scope), see https://stackoverflow.com/a/31140797
Right, let me know if I'm missing something but I believe we don't have to rename the variable to get the behaviour you are expecting.
Could you try defining the BUILD_SHARED_LIBS option in SLEEF's CMakeLists.txt see comment in review? According to the documentation of CMake it should work, when top-level project switches the option.
https://cmake.org/cmake/help/latest/variable/BUILD_SHARED_LIBS.html
Note that if bringing external dependencies directly into the build, such as with [FetchContent](https://cmake.org/cmake/help/latest/module/FetchContent.html#module:FetchContent) or a direct call to [add_subdirectory()](https://cmake.org/cmake/help/latest/command/add_subdirectory.html#command:add_subdirectory), and one of those dependencies has such a call to [option(BUILD_SHARED_LIBS ...)](https://cmake.org/cmake/help/latest/command/option.html#command:option), the top level project must also call [option(BUILD_SHARED_LIBS ...)](https://cmake.org/cmake/help/latest/command/option.html#command:option) before bringing in its dependencies. Failure to do so can lead to different behavior between the first and subsequent CMake runs.
https://discourse.cmake.org/t/build-shared-libs/4200
It’s fine to declare it as an option if you want to change the default. As long as the minimum required CMake version is 3.13, a parent project can override it with a normal variable.
But it might need to be done the right way. In order to fix the parallel build I think you need to put the save/restore stages in a function (since it will have a scope), see https://stackoverflow.com/a/31140797
The original pytorch is use this method, it works on Linux but not works on Windows. If It works, why I do a lot of work to fix it? Linux static lib is a .a file and it is one file library. Windows static is a big .lib file and contains everything. Windows dynamic has two files, a small .lib file only contains interface, and a .dll file. It failed and report can't find symbol issue. The second build, I guess BUILD_SHARED_LIBS changed by restore and let the linker confused sleef lib type. I can't explain the detail why it doesn't work. But I tried it and it not work. It is OS related.
First, what I was getting at is not simply moving back to using BUILD_SHARED_LIBS but also making sure save/add_subdirectory/restore occurs inside a function (or at least a scope) so it builds safely in parallel. Have you tried that? This should be applied everywhere save/restore is used (not just for SLEEF).
I can't explain the detail why it doesn't work. But I tried it and it not work. It is OS related.
Second, if the above approach does not work, maybe it would help to dive a little bit more into why this actually fails. We are not excluding that something else needs to be fixed in SLEEF to make pytorch build on Windows. Introducing SLEEF_BUILD_SHARED_LIBS is just not an appropriate change from our point of view. This blog is a good reference for this type of issues and it mentions the function-trick https://alexreinking.com/blog/building-a-dual-shared-and-static-library-with-cmake.html
If PyTorch's use of BUILD_SHARED_LIBS diverges from its definition (by building both static and shared components) then it needs to be handled properly in PyTorch and not in the components. Therefore, I would not refer to the save/restore trick as a "work around", or more precisely not refer to this as a SLEEF issue.
First, what I was getting at is not simply moving back to using BUILD_SHARED_LIBS but also making sure save/add_subdirectory/restore occurs inside a function (or at least a scope) so it builds safely in parallel. Have you tried that? This should be applied everywhere save/restore is used (not just for SLEEF).
I can't explain the detail why it doesn't work. But I tried it and it not work. It is OS related.
Second, if the above approach does not work, maybe it would help to dive a little bit more into why this actually fails. We are not excluding that something else needs to be fixed in SLEEF to make pytorch build on Windows. Introducing SLEEF_BUILD_SHARED_LIBS is just not an appropriate change from our point of view. This blog is a good reference for this type of issues and it mentions the function-trick https://alexreinking.com/blog/building-a-dual-shared-and-static-library-with-cmake.html
If PyTorch's use of BUILD_SHARED_LIBS diverges from its definition (by building both static and shared components) then it needs to be handled properly in PyTorch and not in the components. Therefore, I would not refer to the save/restore trick as a "work around", or more precisely not refer to this as a SLEEF issue.
set(sources ...)
add_library(SomeLib_static STATIC ${sources})
add_library(SomeLib_shared SHARED ${sources})
From the example, gen sleef_static and sleef(shared) at the same time? do you think it is acceptable solution.
Not sure I understand. I don't think this will solve the issue, but feel fee to try.
I was picturing defining a function encapsulating everything at this link https://github.com/pytorch/pytorch/blob/cee16353db384dd33a4804f9f26829cacfcfbab7/aten/src/ATen/CMakeLists.txt#L442-L478. Then just calling this function.
function(add_sleef)
// basically all the content at link above
endfunction()
add_sleef()
The benefit of the function is that you don't even have to save and restore BUILD_SHARED_LIBS and BUILD_TESTS, the scope of the function itself protects from overriding BUILD_SHARED_LIBS at the global scope.
Not sure I understand. I don't think this will solve the issue, but feel fee to try.
I was picturing defining a function encapsulating everything at this link https://github.com/pytorch/pytorch/blob/cee16353db384dd33a4804f9f26829cacfcfbab7/aten/src/ATen/CMakeLists.txt#L442-L478. Then just calling this function.
function(add_sleef) // basically all the content at link above endfunction() add_sleef()
The benefit of the function is that you don't even have to save and restore BUILD_SHARED_LIBS and BUILD_TESTS, the scope of the function itself protects from overriding BUILD_SHARED_LIBS at the global scope.
-
I means that If we can consider to build shared and static lib at the same time, like mimalloc: https://github.com/microsoft/mimalloc/blob/master/CMakeLists.txt#L22-L23 shared library name: sleef. static library name: sleef_static.
-
I don't think cmake function can works.
BUILD_SHARED_LIBS
is global scope.FROM: https://cmake.org/cmake/help/git-stage/variable/BUILD_SHARED_LIBS.html
Please read the offical guide. The offical guide said that:
do so in the top level CMakeLists.txt file, before any add_library() calls.
and Note that if bringing external dependencies directly into the build, such as with FetchContent or a direct call to add_subdirectory(), and one of those dependencies has such a call to option(BUILD_SHARED_LIBS ...), the top level project must also call option(BUILD_SHARED_LIBS ...)before bringing in its dependencies. Failure to do so can lead to different behavior between the first and subsequent CMake runs.
Use the BUILD_SHARED_LIBS
will bring in a lot of limitation.
Hi @blapie
Acturaly, sleef library as a math library, it is usual been integated into another projects as a compontemt. And in git management system, it is usual as a git submodule.
From the CMake offical document of BUILD_SHARED_LIBS
:
-
do so in the top level CMakeLists.txt file, before any add_library() calls.
-
Note that if bringing external dependencies directly into the build, such as with FetchContent or a direct call to add_subdirectory(), and one of those dependencies has such a call to option(BUILD_SHARED_LIBS ...), the top level project must also call option(BUILD_SHARED_LIBS ...)before bringing in its dependencies. Failure to do so can lead to different behavior between the first and subsequent CMake runs.
In git-cmake build system, we usual build dependency via add_subdirectory()
, As a submodule, sleef request its parent project also define BUILD_SHARED_LIBS
? whether it too much limitation?
pytorch is not and can't set BUILD_SHARED_LIBS
due to it is a CMake reserved variable. And the doc said: Failure to do so can lead to different behavior between the first and subsequent CMake runs.
is similar to my conclusion: https://github.com/shibatch/sleef/pull/510#issuecomment-1934336010
We maintains a open source project should consider easy to developer use. Use BUILD_SHARED_LIBS
control sleef will bring in too much limitation, and impact to parent project and other sub-modules.
My PR is good to solve the issue:
- "SLEEF_" prefix of "SLEEF_BUILD_SHARED_LIBS" shows it only control and impact on sleef project.
- Not use CMake global varibal and not impact to other modules.
- No limitation and easy for developer use.
We have no issue with people using SLEEF as a submodule, if anything that is probably the preferred approach and we provide examples for that.
As a submodule, sleef request its parent project also define BUILD_SHARED_LIBS ? whether it too much limitation?
No, you don't have to define it, but if you need to switch between static and shared libs then that is the flag you should use.
Sorry I am going to repeat myself but your PR "solves" the problem from your point of view (or rather hides it) by introducing flaws in our codebase. We cannot accept a change like that simply because it will give unexpected results to SLEEF users (who also use it as a submodule).
Instead I suggested a simple alternative, that you don't want to try, which is encapsulating the options and add_subdirectory(sleef)
in a function. I did it for you and it works, as far as your simple example goes, it does not modify BUILD_SHARED_LIBS in the global scope : https://github.com/xuhancn/sleef_cmake_example/pull/1
I don't see what's wrong with this approach
- it is actually safer in terms of parallel build and global scope for pytorch
- you don't need me to approve a SLEEF PR to fix your problem
We have no issue with people using SLEEF as a submodule, if anything that is probably the preferred approach and we provide examples for that.
As a submodule, sleef request its parent project also define BUILD_SHARED_LIBS ? whether it too much limitation?
No, you don't have to define it, but if you need to switch between static and shared libs then that is the flag you should use.
Sorry I am going to repeat myself but your PR "solves" the problem from your point of view (or rather hides it) by introducing flaws in our codebase. We cannot accept a change like that simply because it will give unexpected results to SLEEF users (who also use it as a submodule).
Instead I suggested a simple alternative, that you don't want to try, which is encapsulating the options and
add_subdirectory(sleef)
in a function. I did it for you and it works, as far as your simple example goes, it does not modify BUILD_SHARED_LIBS in the global scope : xuhancn/sleef_cmake_example#1I don't see what's wrong with this approach
- it is actually safer in terms of parallel build and global scope for pytorch
- you don't need me to approve a SLEEF PR to fix your problem
I tried your code just now, snapshot as below:
- Output directory is not coordinate as my origin example.
- The output library is shared lib(.dll), but not static lib as your expected.
Maybe is the reason pytorch original code not works on Windows.
blapie_example_output.txt I attached the all log to you for analysis. Windows is not as same as linux, you can setup a Windows build environment and reproduce it for validate. Maybe your referenced blog existly works on Linux, but Windows is different.
@blapie some days passed, any update?
Hi! Sorry, I did not find time to test before. But I just did it on MSYS2 (Unix-like Windows environment) and as expected I get static libs when setting BUILD_SHARED_LIBS=OFF
in the function, and I get shared libs otherwise. This is a Windows on Arm environment and I'm using Clang (not cl) so not exactly your case, but the difference should be quite small.
It would be good to go to the bottom of that, as it could simply be that a minor change in SLEEF is needed to work on Windows on x86 with cl.
Please note that SLEEF support for MSVC/CL is limited for now, in particular reproducing and testing is not available on the repo.
blapie_example_output.txt I attached the all log to you for analysis.
Btw I do not see the messages I added to the CMakeLists.txt in your log. It should show this,
Set global BUILD_SHARED_LIBS in parent project
BUILD_SHARED_LIBS=ON
[...]
BUILD_SHARED_LIBS=ON
[...]
while BUILD_SHARED_LIBS=OFF
is set within the function.
Are the runs you are showing 100% including my changes?
Based on the first couple of command lines I don't think it is including my PR, there should be at least a gh pr checkout 1
.
I attached the all log to you for analysis.
Btw I do not see the messages I added to the CMakeLists.txt in your log. It should show this,
Set global BUILD_SHARED_LIBS in parent project BUILD_SHARED_LIBS=ON [...] BUILD_SHARED_LIBS=ON [...]
while
BUILD_SHARED_LIBS=OFF
is set within the function.Are these runs your are showing 100% including my changes?
Based on the first couple of command lines I don't think it is including my PR, there should be at least a
gh pr checkout 1
.
I just checkout your patch-1
branch and test again.
You can see these marked places:
- CMake include your cmake message.
- Library output directory is
third_party\sleef\bin\sleefscalar.dll
, it is not coordinate as my origin example. - The library binary is shared lib(.dll).
The detailed log attached again. blapie_debug_log_3_14.txt
Ok, right in this case even the config log seems wrong
-- Building shared libs : ON
although the top-level CMakeLists.txt has
set(BUILD_SHARED_LIBS OFF CACHE BOOL "Build sleef as static library" FORCE)
Can you please provide a similar log with the master branch of SLEEF?
Hi! Sorry, I did not find time to test before. But I just did it on MSYS2 (Unix-like Windows environment) and as expected I get static libs when setting
BUILD_SHARED_LIBS=OFF
in the function, and I get shared libs otherwise. This is a Windows on Arm environment and I'm using Clang (not cl) so not exactly your case, but the difference should be quite small.It would be good to go to the bottom of that, as it could simply be that a minor change in SLEEF is needed to work on Windows on x86 with cl.
Please note that SLEEF support for MSVC/CL is limited for now, in particular reproducing and testing is not available on the repo.
This PR is small change to enable msvc cl, which is Microsoft offical compiler.
- We only add a
SLEEF_
prefix toBUILD_SHARED_LIBS
and it is align to https://github.com/shibatch/sleef/pull/509 - Use
SLEEF_BUILD_SHARED_LIBS
to control build shared/static library straightforward.
MSVC should be the fisrt tier compiler on Windows, we need enable it ASAP.
Ok, right in this case even the config log seems wrong
-- Building shared libs : ON
although the top-level CMakeLists.txt has
set(BUILD_SHARED_LIBS OFF CACHE BOOL "Build sleef as static library" FORCE)
Can you please provide a similar log with the master branch of SLEEF?
Checkout sleef submodule to master, and collect log again?
Ok, right in this case even the config log seems wrong
-- Building shared libs : ON
although the top-level CMakeLists.txt has
set(BUILD_SHARED_LIBS OFF CACHE BOOL "Build sleef as static library" FORCE)
Can you please provide a similar log with the master branch of SLEEF?
Checkout sleef submodule to master, and collect log again?
blapie_debug_log_sleef_master.txt Log is here.
Ok, so we have the expected behaviour right? We build the static library while BUILD_SHARED_LIBS=ON is unchanged at the global scope.
sleefscalar.vcxproj -> D:\xuhan\build_pytorch\sleef_cmake_example\build_3\third_party\sleef\lib\sleefscalar.lib
Ok, so we have the expected behaviour right? We build the static library while BUILD_SHARED_LIBS=ON is unchanged at the global scope.
sleefscalar.vcxproj -> D:\xuhan\build_pytorch\sleef_cmake_example\build_3\third_party\sleef\lib\sleefscalar.lib
It works, but pytorch is use default value and it's value is OFF. You mean let pytorch use function(add_sleef)
to integrate sleef?
I don't known if it works, I think is not a simple & straightforward method.
Ok, so we have the expected behaviour right? We build the static library while BUILD_SHARED_LIBS=ON is unchanged at the global scope.
sleefscalar.vcxproj -> D:\xuhan\build_pytorch\sleef_cmake_example\build_3\third_party\sleef\lib\sleefscalar.lib
It works, but pytorch is use default value and it's value is OFF. You mean let pytorch use
function(add_sleef)
to integrate sleef? I don't known if it works, I think is not a simple & straightforward method.
I tried to use cmake function integate sleef into pytorch and it failed at binary size.
PR: https://github.com/xuhancn/pytorch/pull/8
Log:
pytorch_add_sleef_function.txt
Sounds very strange. I don't see the connection between the diff and the error message.