h3-pg
h3-pg copied to clipboard
H3 library is not inlined
Hi,
Investingating performance issues I found that H3 actively resists inlining.
objdump -D h3.so:

Simple math like *pi/180 is not getting inlined.
After some fight I got some pieces but haven't got to destination.
CMake supposedly enables inline like this:
cmake_minimum_required(VERSION 3.9)
cmake_policy(SET CMP0069 NEW)
set(CMAKE_INTERPROCEDURAL_OPTIMIZATION TRUE)
These guys helped a bit inside the h3-pg itself but haven't got me to let linktime optimizations to cross library boundary.
CFLAGS += -flto -O2
LDFLAGS += -flto -Wl,-Bsymbolic-functions -Wl,-z,relro
Any ideas help.
Thanks for the PR, I'll try looking into it. Could it be related to the way h3 is built?
cmake \
-DCMAKE_C_FLAGS=-fPIC \
-DBUILD_TESTING=OFF \
-DENABLE_COVERAGE=OFF \
-DENABLE_DOCS=OFF \
-DENABLE_FORMAT=OFF \
-DENABLE_LINTING=OFF \
..
Is there a Release flag I'm missing or similar?
Thanks for opening the issue on h3 core as well.
Per https://github.com/uber/h3/pull/326#issuecomment-604544429 flag is -DCMAKE_BUILD_TYPE=Release, thanks @isaacbrodsky.
I've spent another couple of hours with it and give up for today.
Test is simple: after you build everything right, this should show empty. In non-inlined build it shows four callsites for square and the function def.
objdump -D h3.so | grep _square
Per uber/h3#326 (comment) flag is
-DCMAKE_BUILD_TYPE=Release, thanks @isaacbrodsky.
Why is this flag not actually used in the latest v3.6.2 release? It looks harmless, and the 2x/3x improvements shown at https://github.com/uber/h3/pull/326 are huge
Release v3.6.3 includes -DCMAKE_BUILD_TYPE=Release which should improve performance.
Still objdump -D h3.so | grep _square shows three callsites, so I'll keep this issue open until we figure out how to inline h3. Hopefully something good will come from uber/h3#326 but apart from that I have no ideas.
@Komzpa do you have any ideas for benchmarking the built extension? I would like to be able to compare changes like these (after being built into extension and not just h3 core directly).
@zachasme pgbench is an industry standard to benchmark query execution times in postgres.
We should also keep an eye on https://github.com/uber/h3/pull/360.
Solved by #75