p4c
p4c copied to clipboard
Depend on Boost using FetchContent instead of relying on system-provided boost.
This PR sets up boost to be installed via FetchContent instead of depending on the system-provided version. Because a lot of Boost is header-only this is a little tricky.
Recent versions of boost have improved their CMake support significantly. I pushed a version which only requires downloading boost and linking the appropriate targets. The required boost headers are exported with each target, comparable to Abseil. To make sure that the linked headers are included as system headers I had to do some patching, but that is fairly straightforward.
If one desires to add additional boost targets, they can do so by setting P4C_TARGET_BOOST_LIBRARIES at command line or before invoking boost.
Dependencies
We currently depend on multiprecision and format as part of our core. iostreams for one particular optimization in the parser and graphs for the graphs backend.
graphs alone has an egregious amount of dependencies: https://github.com/boostorg/graph/blob/develop/CMakeLists.txt#L18 iostreams dependencies https://github.com/boostorg/iostreams/blob/develop/CMakeLists.txt#L79 format dependencies https://github.com/boostorg/format/blob/develop/CMakeLists.txt#L17 multiprecision dependencies https://github.com/boostorg/multiprecision/blob/develop/CMakeLists.txt#L30
The only project out of those that is currently standalone is multiprecision.
~~Because of this dependency problem we need to add almost all the different boost modules to the include path to make sure that we are including the local boost header first. We do this by globbing for the include folder and adding it to the include path.~~
@fruffy How much of the boost used is not header-only? If it would be just headers, we may just extract a piece of it and vendor.
@fruffy How much of the boost used is not header-only? If it would be just headers, we may just extract a piece of it and vendor.
iostreams and graphs are not header-only iirc.
I am currently running into this problem: https://stackoverflow.com/q/72913306 where you need to set up the includes for all the little boost projects. I have not gotten around to untangling these things yet.
We currently depend on multiprecision and format as part of our core. iostreams for one particular optimization in the parser and graphs for the graphs backend.
graphs alone has an egregious amount of dependencies: https://github.com/boostorg/graph/blob/develop/CMakeLists.txt#L18
iostreams dependencies https://github.com/boostorg/iostreams/blob/develop/CMakeLists.txt#L79
format dependencies https://github.com/boostorg/format/blob/develop/CMakeLists.txt#L17
multiprecision dependencies https://github.com/boostorg/multiprecision/blob/develop/CMakeLists.txt#L30
The only project out of those that is currently standalone is multiprecision.
graph is used by one particular backend, so the dependency could be moved there. Everything else besides iostreams seems to be header-only and could be vendored (there is a special boost script for this, btw).
Do we need a concrete version of boost that is newer then what is available on supported OSes? If not, I don't see a big reason to fetch boost ourselves. It can be easily installed almost everywhere.
Do we need a concrete version of boost that is newer then what is available on supported OSes? If not, I don't see a big reason to fetch boost ourselves. It can be easily installed almost everywhere.
-
The problem is the same as with Protobuf. Boost consistently introduces breaking or subtle changes between versions which often cause problems. There are quite a few issues around boost on this repo alone, the latest one being the Ubuntu 18.04 breakage. By controlling the Boost version we can at least make sure that we have one canonical version we write against.
-
We ultimately want to get rid of most of Boost because it is a big dependency (#3898) and only want to pull in packages selectively. This is trying to set up an infrastructure for it.
Please don't force downstreams that use boost libraries not used by P4C to get boost themselves, downstreams need to have a way of adding to the libraries that will be set up.
Yeah this is the current blocker. I have not figured out how to do this proper yet. The problem is circular:
The Boost Fetchcontent module needs to be instantiated to set up the CMake targets correctly, but before Boost can be instantiated we need to know the modules we want to instantiate (BOOST_INCLUDE_LIBRARIES)...
Trying to pull the modules individually is a nightmare because the dependency graph of the boost modules is complicated. E.g., graphs has these dependencies: https://github.com/boostorg/graph/blob/develop/CMakeLists.txt#L18, which in turn have dependencies themselves.
Also, how much longer does the initial build take with boost built by ourselves?
What takes a bit is downloading boost, the actual compilation is the same because boost is almost all headers.
What takes a bit is downloading boost, the actual compilation is the same because boost is almost all headers.
I think I already mentioned that it is possible to extract header-only part of boost and vendor it. Then there will be no need to fetch the things.
boost::format and boost::multiprecision are both header-only, we can just fetch them (not sure about their dependencies though). For iostreams likely we'd just re-implement the solution from scratch. Maybe memory-mapped files would be even better approach here. Need to check.
Likely something else could be implemented instead of boost::random, after all only uniform distribution seems to be required.
So, the only dependency would be boost:graphs that could be used when graph backend is selected.
So, boost::random is also header-only. Dependencies are also header-only. The size of boost subset to support the functionality we need is ~10 MiB (this includes format, random and multiprecision). And yes, we can patch with whatever C++20 features would be necessary :)
I think I already mentioned that it is possible to extract header-only part of boost and vendor it. Then there will be no need to fetch the things.
I looked at this but this can quickly become a maintenance headache since you will have to vendor many dozens of files. I found just downloading boost once and then picking and choosing the parts you need is much easier. If necessary, we can clean up the includes we need to add.
boost::formatandboost::multiprecisionare both header-only, we can just fetch them (not sure about their dependencies though).
boost::multiprecision can be used in standalone mode, boost::format unfortunately pulls in a fair bit of other dependencies which also pull in dependencies.
There is an official boost tool bcp to make such subset. And I do already have this subset just in case :)
There is an official boost tool
bcpto make such subset. And I do already have this subset just in case :)
We could add those instead! I have started with boost::format there but then gave up on it.
There is an official boost tool
bcpto make such subset. And I do already have this subset just in case :)
So after discussing this offline it looks like this may lead to an excessive amount of files we have to check into version control. The approach I have currently is probably the simplest. You only need to pull boost once and then can pick and choose the dependencies.
There is still a problem, I am adding all the possible include paths because of the way boost is structured. This creates a bit of spam, maybe there is a better way to only add one include path, but that requires some fiddling with folders.
Recent versions of boost have improved their CMake support significantly. I pushed a version which only requires downloading boost and linking the appropriate targets. The required boost headers are exported with each target, comparable to Abseil. To make sure that the linked headers are included as system headers I had to do some patching, but that is fairly straightforward.
If one desires to add additional boost targets, they can do so by setting P4C_TARGET_BOOST_LIBRARIES at command line or before invoking boost.
This might be a significant regression for downstream backends as it would require lots of patching of p4c cmake files. Adding some additional cmake options to build cmdline also looks a bit hacky to me – everything should be self-contained. Essentially, before the backend could just do find_package (Boost COMPONENTS <whatever is necessary>) in its own cmake file and be ok. Ideally it should be similar way with the new functionality.
Here, it looks like many things are hardcoded. In principle we can use in-tree graph backend to model new functionality, things should not be hard-coded inside cmake macro that is supposed to be generic.
This might be a significant regression for downstream backends as it would require lots of patching of p4c cmake files. Adding some additional cmake options to build cmdline also looks a bit hacky to me – everything should be self-contained. Essentially, before the backend could just do
find_package (Boost COMPONENTS <whatever is necessary>)in its own cmake file and be ok. Ideally it should be similar way with the new functionality.Here, it looks like many things are hardcoded. In principle we can use in-tree
graphbackend to model new functionality, things should not be hard-coded inside cmake macro that is supposed to be generic.
I do not think there is any hardcoding going on, it just makes a couple implicit assumptions of the compiler build system explicit. For example, graphs and iostreams were already part of the top-level CMake. We're only improving things here.
Where things are a problem is that the back ends can not really communicate their dependencies to the top-level FetchContent macro. At least I do not see an easy way to do that. Boost CMake targets are only visible after invoking FetchContent_MakeAvailable(Boost) but that only can be invoked after all Boost modules are known.
One way this circular dependency could be resolved is to add a "BoostDeps.cmake" file to each extension, which is then picked up by this macro...
Essentially, before the backend could just do
find_package (Boost COMPONENTS <whatever is necessary>)in its own cmake file and be ok. Ideally it should be similar way with the new functionality.
The original behavior is preserved with P4C_USE_PREINSTALLED_BOOST. But even find_package requires boost to be installed previously, which is a separate installation step. So the effort of adding a CMake parameter or installing boost separately is the same. I understand, however, that it requires significant refactoring for back ends, which is a showstopper.
In light #4950 of I would like to push this forward. Any back end that does not want to use the FetchContent version can just toggle the system-provided version which will give them the same functionality as before. Fixing this problem will give us 1) C++20 2) more Ci stability as quite a few of the spurious failures are related to system-packages changing their boost version.
@asl, @vlstill let me know what you think.