h3-pg icon indicating copy to clipboard operation
h3-pg copied to clipboard

H3 library is not inlined

Open Komzpa opened this issue 5 years ago • 7 comments
trafficstars

Hi,

Investingating performance issues I found that H3 actively resists inlining.

objdump -D h3.so:

image

Simple math like *pi/180 is not getting inlined.

After some fight I got some pieces but haven't got to destination.

CMake supposedly enables inline like this:

cmake_minimum_required(VERSION 3.9)
cmake_policy(SET CMP0069 NEW)
set(CMAKE_INTERPROCEDURAL_OPTIMIZATION TRUE)

These guys helped a bit inside the h3-pg itself but haven't got me to let linktime optimizations to cross library boundary.

CFLAGS += -flto -O2
LDFLAGS += -flto -Wl,-Bsymbolic-functions -Wl,-z,relro

Any ideas help.

Komzpa avatar Mar 25 '20 18:03 Komzpa

Thanks for the PR, I'll try looking into it. Could it be related to the way h3 is built?

cmake \
  -DCMAKE_C_FLAGS=-fPIC \
  -DBUILD_TESTING=OFF \
  -DENABLE_COVERAGE=OFF \
  -DENABLE_DOCS=OFF \
  -DENABLE_FORMAT=OFF \
  -DENABLE_LINTING=OFF \
  ..

Is there a Release flag I'm missing or similar?

Thanks for opening the issue on h3 core as well.

zachasme avatar Mar 26 '20 13:03 zachasme

Per https://github.com/uber/h3/pull/326#issuecomment-604544429 flag is -DCMAKE_BUILD_TYPE=Release, thanks @isaacbrodsky.

Komzpa avatar Mar 26 '20 17:03 Komzpa

I've spent another couple of hours with it and give up for today. Test is simple: after you build everything right, this should show empty. In non-inlined build it shows four callsites for square and the function def. objdump -D h3.so | grep _square

Komzpa avatar Mar 26 '20 21:03 Komzpa

Per uber/h3#326 (comment) flag is -DCMAKE_BUILD_TYPE=Release, thanks @isaacbrodsky.

Why is this flag not actually used in the latest v3.6.2 release? It looks harmless, and the 2x/3x improvements shown at https://github.com/uber/h3/pull/326 are huge

AbelVM avatar Apr 08 '20 07:04 AbelVM

Release v3.6.3 includes -DCMAKE_BUILD_TYPE=Release which should improve performance.

Still objdump -D h3.so | grep _square shows three callsites, so I'll keep this issue open until we figure out how to inline h3. Hopefully something good will come from uber/h3#326 but apart from that I have no ideas.

@Komzpa do you have any ideas for benchmarking the built extension? I would like to be able to compare changes like these (after being built into extension and not just h3 core directly).

zachasme avatar Apr 08 '20 07:04 zachasme

@zachasme pgbench is an industry standard to benchmark query execution times in postgres.

Komzpa avatar Apr 08 '20 08:04 Komzpa

We should also keep an eye on https://github.com/uber/h3/pull/360.

zachasme avatar Aug 26 '20 12:08 zachasme

Solved by #75

zachasme avatar Aug 23 '22 12:08 zachasme