qlever Clarify minimum compiler / STL requirements.

Currently, native_setup.md indicates that GCC >= 7.x on Ubuntu > 18.04 should work. The Native Build Workflows are only testing GCC >= 11.1.0 and clang 13.01.1 in the following combinations:

GCC 11.1.0 / libstdc++ 11.1.0
GCC 12.0.1 / libstdc++ 12.0.1
clang 13.0.1 / libc++ 13.0.1

When attempting to build using GCC 9.4.0 / libstdc++ 9.4.0 (default for Ubuntu 20.04.4 LTS) the build fails due to a lack of coroutine support:

 cmake -DCMAKE_BUILD_TYPE=Release -DLOGLEVEL=INFO -DUSE_PARALLEL=true -GNinja .. && ninja -j 1
-- The C compiler identification is GNU 9.4.0
-- The CXX compiler identification is GNU 9.4.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Performing Test HAS_COROUTINES
-- Performing Test HAS_COROUTINES - Failed
...
[1/612] Building CXX object third_party/abseil-cpp/absl/base/CMakeFiles/absl_log_severity.dir/log_severity.cc.o
FAILED: third_party/abseil-cpp/absl/base/CMakeFiles/absl_log_severity.dir/log_severity.cc.o
/usr/bin/c++   -I../third_party/antlr4/runtime/Cpp/runtime/src -I../third_party/googletest/googletest/include -I../third_party/googletest/googlemock/include -I../third_party/json -I../third_party/ctre/include -I../third_party/abseil-cpp -Wall -Wextra  -O3 -DNDEBUG -O3   -fcoroutines-ts -fdiagnostics-color=always -Wall -Wextra -Wcast-qual -Wconversion-null -Wformat-security -Wmissing-declarations -Woverlength-strings -Wpointer-arith -Wundef -Wunused-local-typedefs -Wunused-result -Wvarargs -Wvla -Wwrite-strings -DNOMINMAX -std=gnu++2a -MD -MT third_party/abseil-cpp/absl/base/CMakeFiles/absl_log_severity.dir/log_severity.cc.o -MF third_party/abseil-cpp/absl/base/CMakeFiles/absl_log_severity.dir/log_severity.cc.o.d -o third_party/abseil-cpp/absl/base/CMakeFiles/absl_log_severity.dir/log_severity.cc.o -c ../third_party/abseil-cpp/absl/base/log_severity.cc
c++: error: unrecognized command line option ‘-fcoroutines-ts’

A third-party dependency is trying for coroutine support, which was added to GCC 10 as per the GCC Wiki. It may be possible to change the implementation to use coroutines from Boost, but that is not the current state of the codebase.

Since the GitHub workflows establish a de-facto baseline, and C++20 coroutines are being used, perhaps the following changes would be useful:

Update native_setup.md to specify C++20 coroutine support as the minimum requirement for builds noting the specific compilers used for running the workflow tests.
Updated CMakeLists.txt to check for coroutine support as required.

Alternatively, if I misunderstood something and GCC >= 7.x on Ubuntu > 18.04 is still workable, some documentation would be useful on how to get the codebase to build with those.

In addition, if some of the maintainers are willing to share their specific configuration of hardware, compiler, libraries, and OS used for their development sandboxes, that would be interesting.

Jul 25 '22 14:07 donpellegrino

For reference, it is worth noting that the Intel C++ Compiler does not work with C++20 coroutines as per https://www.intel.com/content/www/us/en/developer/articles/technical/c20-features-supported-by-intel-cpp-compiler.html

Jul 25 '22 16:07 donpellegrino

Hey, Thanks for pointing this out. The native_setup.md is out of sync here. We require G++11 or clang-13 because we use C++20 and most notably c++ coroutines (not only in the Boost dependency, but also in our own code). For initial testing we recommend the Docker setup via the very convenient Qlever-control script. To see, what has to be installed on an Ubuntu 18.04 or 20.04 to make Qlever run natively, see the corresponding Dockerfiles in the Dockerfiles subdirectory. Most notably you need some PPAs to get newer versions of G++ and boost.

As for the machines: Yes, we could report some of our configurations together with QLever's performance there. Just to give you a quick idea:

all of our Machines run with Ubuntu. (18.04 or 20.04, currently we are gradually updating to 22.04 where you need no more PPAs). Sometimes we run inside Docker (very convenient and platform independent) and sometimes natively (is ~20% faster for some reason).
For Budget reasons we typically use Desktop hardware with a lot of RAM. Most of our recent machines have ADM Ryzen CPUs (3800X, 5900X) and 128GB of RAM (the maximum for desktop platforms). QLever currently requires quite some disk space and currently does not benefit that much from very fast disks, so we often have HDDs (in a RAID) on our machines, but also regularly test with fast NVME-SSDs.
It also really depends on what dataset you want to use. For Wikidata we highly recommend 128GB of RAM, on smaller datasets you can work with much less. But the more efficient usage of RAM is definitely somewhere on our Roadmap.

Please let us know what else you are interested in. We have no experiences with the Intel compiler, is this somewhat important for your scenario?

Jul 25 '22 16:07 joka921

@joka921 - Thanks for the background and explanation! That is just what I needed.

My challenge is getting a development sandbox put together. I have been using a cluster where I don't have sudo/root as my primary development environment, but I can put something together on smaller equipment based on the excellent Dockerfiles. I had no problems building on Ubuntu 20 running inside WSL2 on a commodity memory-constrained (16 GB) laptop. The only learning was that I had to limit ninja with -j <less than ncpus> to keep the parallel compilation processes from being killed by out of memory (OOM) signals.

One cluster I use has GCC 9.4.0 per the Ubuntu 20 default and Intel oneAPI C++ 2022.1. I manually built GCC 11 and GCC 12 but neither worked with Release builds for various reasons that I have yet to figure out. They did work for Debug builds but the run-time was orders of magnitude slower than expected per the timings in the GitHub workflows and the commodity laptop.

Some directions I plan to take include the following, which I would be excited to discuss but are probably out of scope for this specific GitHub Issue:

Create a Singularity container for HPC deployments.
Explore adding HDT files as a backend. HDT uses indexes to quickly respond to (s,p,o) pattern queries and should be able to provide counts to inform query planning. It would benefit from qlever's optimized SPARQL as a query engine if an interface to the HDT indexes could be figured out.
Add documentation on PubChemRDF in the style of the Wikidata example already available.

Jul 25 '22 16:07 donpellegrino

Concerning PubChemRDF: I have just started a download of the core data (everything except compound/nbr2d and compound/nbr3d) and will set up a QLever endpoint for it. I will let you know when it's done. This was on our TODO list anyway and your comment just pushed it to the top :-) If no significant problems occur, the process is mostly automatic, let's see.

Jul 26 '22 23:07 hannahbast

qlever qlever copied to clipboard

Clarify minimum compiler / STL requirements.

qlever
qlever copied to clipboard