cereal icon indicating copy to clipboard operation
cereal copied to clipboard

LTO breaks StaticObject on Fedora 37

Open benmwebb opened this issue 1 year ago • 1 comments

Code built with g++ with link time optimization (LTO) fails with a "Trying to save an unregistered polymorphic type" exception. The same code works fine without LTO. This is on a stock Fedora 37 machine (with gcc 12.2.1, cereal 1.3.2).

My code is a large mixed C++ and Python project, but I boiled it down to a minimal reproducer here: cereal_test.zip

A.h defines and registers a polymorphic type Wrapped and a class Container that stores a shared_ptr<Wrapped>. B.h registers a Wrapped subclass BWrapped. We build two dynamic libraries libA.so and libB.so and wrap each with SWIG so they can be used from Python as A.py and B.py (note we build only the A wrapper with -flto):

g++ -fPIC -Wall -shared A.cpp -o libA.so
g++ -fPIC -Wall -shared B.cpp -o libB.so
swig -python -c++ A.i
swig -python -c++ B.i
g++ -flto -fPIC -shared A_wrap.cxx -I/usr/include/python3.11 -o _A.so -L. -lA
g++ -fPIC -shared B_wrap.cxx -I/usr/include/python3.11 -o _B.so -L. -lA -lB

If we then try to serialize a Container object that contains a BWrapped in Python (the _get_as_binary method uses cereal to write Container to a BinaryOutputArchive and then returns the resulting data), it fails:

$ cat test.py
import A, B
w = B.BWrapped()
c = A.Container(w)
print(c._get_as_binary())
$ python3 test.py
terminate called after throwing an instance of 'cereal::Exception'
  what():  Trying to save an unregistered polymorphic type (BWrapped).

If we rebuild A without LTO though, it works fine:

$ g++ -fPIC -shared A_wrap.cxx -I/usr/include/python3.11 -o _A.so -L. -lA
$ python3 test.py
b'\x01\x00\x00\x80\x08\x00\x00\x00\x00\x00\x00\x00BWrapped\x01\x00\x00\x80'

It looks like the problem is that LTO causes StaticObject to not work correctly. If we add to A.h a function

void show_a_output_binding_map() {
  auto const & bindingMap = cereal::detail::StaticObject<cereal::detail::OutputBindingMap<cereal::BinaryOutputArchive>>::getInstance().map;
  std::cerr << "A map is at " << &bindingMap << std::endl;
}

and a similar function to B.h then with LTO we see

$ cat test.py
import A, B
A.show_a_output_binding_map()
B.show_b_output_binding_map()
w = B.BWrapped()
c = A.Container(w)
print(c._get_as_binary())
$ python3 test.py
A map is at 0x7f029b6fec40
B map is at 0x7f029b3ff540
terminate called after throwing an instance of 'cereal::Exception'
  what():  Trying to save an unregistered polymorphic type (BWrapped).

i.e. StaticObject is not a singleton so when B registers BWrapped, A cannot see it. (Without LTO, the address printed for A map and B map is the same.)

I see cereal has specific code (in detail/static_object.hpp) to try to prevent link optimization from breaking StaticObject, but it seems not to be working here. Obviously an easy workaround is "don't use LTO" but I'd like to find a better solution. I can modify the SWIG interface, so perhaps I can add some code to the generated modules that explicitly references StaticObject and so persuades the linker not to mangle the code?

benmwebb avatar Mar 20 '23 19:03 benmwebb

FWIW, I see the exact same issue when building for Windows (I use MSVS 2015, for 64-bit). (The reproducer code is similar, except that functions need the usual dllexport/import tags so that DLLs work.)

Our workaround for now, linked above, adds a map of serialize/deserialize functions to our application itself, so we can be sure they're stored only in one place. Works for us but it is definitely not as general as cereal's polymorphic machinery.

benmwebb avatar Mar 23 '23 18:03 benmwebb