protobuf icon indicating copy to clipboard operation
protobuf copied to clipboard

[cmake] v27.* break the static library support compare to v26.1

Open Mizux opened this issue 1 year ago • 12 comments

For google/or-tools we tried to upgrade from v26.1 (working) to v27.2 (broken) TLDR: in cmake you have to target_link_library() to protobuf in add_executable() and add_library(), if Protobuf has been built as STATIC this will lead to an ODR violation. Until v26.1 it was still working (some underlying map was still unique, starting v27.0 we got two internal map thus the error message below)

note: release v27.0 and v27.1 are also broken...

Language: Python (native C++ module) i.e. pybind11 wrappers....

What operating system (Linux, Windows, ...) and version? Linux and MacOS are impacted...

What runtime / compiler are you using (e.g., python version or gcc version) From Python 3.9 to Python 3.12 gcc ~[10,14] see default version from almalinux:latest, rockylinux:9, fedora:latest, debian:latest, opensuse, ubuntu:latest .... ref: https://repology.org/project/gcc/badges. note: we are targeting C++17 (with agregate initialisation) or C++20 (MSVC)

What did you do? Please read this issue for a full detailed bug report:

  • https://github.com/google/or-tools/issues/4307

What did you expect to see all or-tools tests pass as before...

What did you see instead? all python tests using pybind11 and protobuf failed with error:

 F0000 00:00:1720999376.464679    2304 generated_message_reflection.cc:3620] Check failed: file != nullptr 

Make sure you include information that can help us debug (full error message, exception listing, stack trace, logs). Again, Please read the issue for a full detailed bug report:

  • https://github.com/google/or-tools/issues/4307

Anything else we should know about your project / environment using kind of bissect on protobuf tag use in our build, I figure out it is these commits (split in two parts -_-) are at fault:

  • protocolbuffers/protobuf@49c83ab contains auto generated code after the commit abc9bae
  • protocolbuffers/protobuf@abc9bae so I guess this one is not atomic and can't be tested (compilation error)...
Protobuf Version Test Date
v27.0 FAIL 2024/05/23
... ... ...
49c83ab799977592744f50df93957ddec1e12e70 FAIL 2024/01/30 (newer)
abc9bae7c19d9fff4b0be0ffd3a01f947d3b487a NA(0) 2024/01/30 (need @49c83ab)
58baeb4c3b664f8918d24cef5151083d9da9767c PASS 2024/01/30 (older)
... ... ...
v26.1 PASS 2024/03/27
  1. Can't test abc9bae directly since it miss the next commit which generate some stuff...

For Googlers: it is cl/602725967

note: git revert both commit on top of v27.2 tag was not working (too many conflict to make it...)

Mizux avatar Jul 15 '24 06:07 Mizux

MRE: bazel-proto

Some update: This weekend I was able to reproduce it with a pet project without involving pybind11 etc... EDIT: same issue with "v28.0-rc1" too

It seems the issue comes from the shared library nature...

here the project layout:

 graph TD;
      A[bin/App<br>binary]-->FB[lib/libFooBar.so<br>library];
      FB-->F[lib/libFoo.so<br>library];
      FB-->B[lib/libBar.so<br>library];
      F-.->Pc[C.proto];
      Pc-- "import" -->Pa[A.proto];
      Pc-- "import" -->Pb[B.proto];

ref: https://github.com/mizux/bazel-proto

Trace:

% cmake -S. -Bbuild
% cmake --build build -j
% ./build/bin/App
...
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
F0000 00:00:1721036258.637555 1559662 generated_message_reflection.cc:3619] Check failed: file != nullptr 
*** Check failure stack trace: ***
zsh: IOT instruction  ./build/bin/App

Workaround

if I apply this patch to make SHARED libraries to be STATIC:

diff --git a/foo/CMakeLists.txt b/foo/CMakeLists.txt
index aee9266..c7476a1 100644
--- a/foo/CMakeLists.txt
+++ b/foo/CMakeLists.txt
@@ -1,4 +1,4 @@
-add_library(Foo) # shared implicit
+add_library(Foo STATIC "")
 
 get_cpp_proto(PROTO_HDRS PROTO_SRCS)
 target_sources(Foo
diff --git a/foobar/CMakeLists.txt b/foobar/CMakeLists.txt
index 724f2e1..d4fe85a 100644
--- a/foobar/CMakeLists.txt
+++ b/foobar/CMakeLists.txt
@@ -1,4 +1,4 @@
-add_library(FooBar "") # shared implicit
+add_library(FooBar STATIC "")
 target_sources(FooBar PRIVATE foobar.h foobar.cc)
 target_include_directories(FooBar
   PUBLIC
% cmake --build build -j
% ./build/bin/App
...
(exit 0)

i.e. App embedded the libFoo.a then test pass, aka bin/App run without error... So I suspect the Protobuf commit to break the shared library mode.

Mizux avatar Jul 15 '24 06:07 Mizux

Back to google/or-tools when generating our wheel package we have the following layout:

ortools
├── __init__.py
├── .libs
│   └── libortools.so.10 (48Mo)
├── algorithms
│   ├── __init__.py
│   └── python
│       ├── __init__.py
│       ├── knapsack_solver.cpython-312-x86_64-linux-gnu.so
│       └── knapsack_solver.pyi
├── graph
│   ├── __init__.py
│   └── python
│       ├── __init__.py
│       ├── max_flow.cpython-312-x86_64-linux-gnu.so
│       ├── max_flow.pyi
│       ├── min_cost_flow.cpython-312-x86_64-linux-gnu.so
│       ├── min_cost_flow.pyi
│       ├── linear_sum_assignment.cpython-312-x86_64-linux-gnu.so
│       └── linear_sum_assignment.pyi
├── routing
│   ├── enums_pb2.py
│   ├── enums_pb2.pyi
│   ├── ils_pb2.py
│   ├── ils_pb2.pyi
│   ├── __init__.py
│   └── python
│       ├── __init__.py
│       ├── model.cpython-312-x86_64-linux-gnu.so (2Mo)
│       ├── model.pyi
├── sat
...
 graph TD;
      D_O[ortools];
      D_O-->D_L[.libs];
      D_L-->OR[(libortools.so<br>ortools/.lib)];
      D_O-->D_A[algorithms];
      D_A-->D_AP[python];
      D_AP-->A[(algorithms.cpython-312-x86_64-linux-gnu.so<br>ortools/algorithms/python)];
      A-. link .->OR;

      D_O-->D_G[graph];
      D_G-->D_GP[python];
      D_GP-->G1[(max_flow.cpython-312-x86_64-linux-gnu.so<br>ortools/graph/python)];
      D_GP-->G2[(min_cost_flow.cpython-312-x86_64-linux-gnu.so<br>ortools/graph/python)];
      G1-. link .->OR;
      G2-. link .->OR;

      D_O-->D_R[routing];
      D_R-->D_RP[python];
      D_RP-->R[(model.cpython-312-x86_64-linux-gnu.so<br>ortools/routing/python)];
      R-. link .->OR;
            
      D_O-->D_X[component];
      D_X-->D_XP[python];      
      D_XP-->X[(component.cpython-312-x86_64-linux-gnu.so<br>ortools/<component>/python)];
      X-. link .->OR;

note: we are using RUNPATH for each *.cpython.so lib to find the libortools.so.

objdump -p ortools/algorithms/python/knapsack_solver.cpython-312-x86_64-linux-gnu.so | grep RUNPATH
  RUNPATH              $ORIGIN:$ORIGIN/../../../ortools/.libs

using a shared lib libortools.so avoid to have to duplicate symbols in each python "sub" module. note: libortools.so it's 48Mo, since we have a dozen of python modules so 48Mo * 12 is a no go (pypi limit is 200Mo per wheel packages...)

Mizux avatar Jul 15 '24 08:07 Mizux

Dev Note playing with http://github.com/mizux/bazel-proto and using a hackish version of protobuf 58baeb4 vs 49c83ab

Adding patch:

diff --git a/src/google/protobuf/generated_message_reflection.cc b/src/google/protobuf/generated_message_reflection.cc
index d0543d134..450cf589d 100644
--- a/src/google/protobuf/generated_message_reflection.cc
+++ b/src/google/protobuf/generated_message_reflection.cc
@@ -3642,6 +3642,7 @@ void AssignDescriptorsImpl(const DescriptorTable* table, bool eager) {
   }
 
   // Fill the arrays with pointers to descriptors and reflection classes.
+  std::cerr << "filename: '" << table->filename << "'\n";
   const FileDescriptor* file =
       DescriptorPool::internal_generated_pool()->FindFileByName(
           table->filename);

Using 58baeb4 :

./build/bin/App
...
"FooBar":{"Bar":{"int":13,"int64":31},"Foo":{"int":17,"int64":42}}
filename: 'google/protobuf/descriptor.proto'
filename: 'google/protobuf/cpp_features.proto'
filename: 'foo/C.proto'
filename: 'foo/A.proto'
filename: 'foo/B.proto'
a {
  name: "42"
}
b {
  value: 42
}

Using 49c83ab :

./build/bin/App
...
"FooBar":{"Bar":{"int":13,"int64":31},"Foo":{"int":17,"int64":42}}
filename: 'google/protobuf/descriptor.proto'
filename: 'google/protobuf/cpp_features.proto'
filename: 'foo/C.proto'
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
F0000 00:00:1721046248.069280 1828260 generated_message_reflection.cc:3623] Check failed: file != nullptr 
*** Check failure stack trace: ***
zsh: IOT instruction  ./build/bin/App

Mizux avatar Jul 15 '24 13:07 Mizux

Small update, buiding Protobuf in SHARED make the bazel-proto tests (e.g. foo_test) pass...

diff --git a/cmake/dependencies/CMakeLists.txt b/cmake/dependencies/CMakeLists.txt
index 7bf92be..3b712ba 100644
--- a/cmake/dependencies/CMakeLists.txt
+++ b/cmake/dependencies/CMakeLists.txt
@@ -52,13 +52,15 @@ if(BUILD_Protobuf)
   message(CHECK_START "Fetching Protobuf")
   list(APPEND CMAKE_MESSAGE_INDENT "  ")
   set(protobuf_BUILD_TESTS OFF)
-  set(protobuf_BUILD_SHARED_LIBS OFF)
+  set(protobuf_BUILD_SHARED_LIBS ON)
   set(protobuf_BUILD_EXPORT OFF)
   set(protobuf_MSVC_STATIC_RUNTIME OFF)
   #set(protobuf_BUILD_LIBUPB ON)

i.e. BAD: bin/foo_test -> lib/libFoo.so --static link--> libprotobuf.a GOOD: bin/foo_test -> lib/libFoo.so --ldd--> libprotobuf.so

Mizux avatar Jul 17 '24 13:07 Mizux

Adding some log (thx to @sbenzaquen for the suggestion). Seems we have an ODR violation when using a static version of Protobuf

The patch in Protobuf to have some insight...

git diff HEAD~1
diff --git a/src/google/protobuf/generated_message_util.cc b/src/google/protobuf/generated_message_util.cc
index e33d22fc4..7b9663d43 100644
--- a/src/google/protobuf/generated_message_util.cc
+++ b/src/google/protobuf/generated_message_util.cc
@@ -88,18 +88,23 @@ void InitWeakDefaults() {}
 
 PROTOBUF_CONSTINIT std::atomic<bool> init_protobuf_defaults_state{false};
 static bool InitProtobufDefaultsImpl() {
+  std::cerr << __func__ << ":" << __LINE__ << " call...\n";
   fixed_address_empty_string.DefaultConstruct();
   OnShutdownDestroyString(fixed_address_empty_string.get_mutable());
   InitWeakDefaults();
 
 
   init_protobuf_defaults_state.store(true, std::memory_order_release);
+  std::cerr << __func__ << ":" << __LINE__ << " done.\n";
   return true;
 }
 
 void InitProtobufDefaultsSlow() {
+  std::cerr << __func__ << ":" << __LINE__ << " call...\n";
   static bool is_inited = InitProtobufDefaultsImpl();
   (void)is_inited;
+  std::cerr << "is_inited: " << &is_inited << ".\n";
+  std::cerr << __func__ << ":" << __LINE__ << "done.\n";
 }
 // Force the initialization of the empty string.
 // Normally, registration would do it, but we don't have any guarantee that

The trace of build/bin/foo_test when using libprotobuf.a patched and with the commit @49c83ab

./build/bin/foo_test 
InitProtobufDefaultsSlow:103 call...
InitProtobufDefaultsImpl:91 call...
InitProtobufDefaultsImpl:98 done.
is_inited: 0x7ff270270e08.
InitProtobufDefaultsSlow:107done.
InitProtobufDefaultsSlow:103 call...
InitProtobufDefaultsImpl:91 call...
InitProtobufDefaultsImpl:98 done.
is_inited: 0x5641f8051b48.
InitProtobufDefaultsSlow:107done.
Running main() from /usr/local/google/home/corentinl/dev/repo/bazel-proto/build/_deps/googletest-src/googletest/src/gtest_main.cc
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from FooTest
[ RUN      ] FooTest.ProtoFunction
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
F0000 00:00:1721369740.493581  753437 generated_message_reflection.cc:3622] Check failed: file != nullptr 
*** Check failure stack trace: ***
zsh: IOT instruction  ./build/bin/foo_test

note: same ODR violation prior to the patch (i.e. protobuf v26.1) but without the assert fail side effect thus it was unnoticed until now...

Mizux avatar Jul 19 '24 06:07 Mizux

@sbenzaquen can you take a look to see if we can provide any more insight?

googleberg avatar Sep 09 '24 02:09 googleberg

Update: using BUILD_SHARED_LIBS for abseil-cpp and Protobuf we manage to fix the ODR violation.

So AFAIK Protobuf cmake based build do not support STATIC library build and should not advertise it i.e. cmake should detect if Protobuf will be built in static and issue a warning/error or force SHARED build only IMHO...

ref: https://cmake.org/cmake/help/latest/variable/BUILD_SHARED_LIBS.html

Mizux avatar Oct 21 '24 13:10 Mizux

This one really helps me out, I can reproduce this on grpc 1.68.x built from source with "protobuf_BUILD_SHARED_LIBS=OFF" and can set to "ON" to avoid it...

zgjja avatar Nov 28 '24 10:11 zgjja

I believe we're seeing this issue in v29.3 also. You can reproduce it by compiling protobuf as a static library using MSVC, then any trivial .proto schema will trigger the error.

It's coming from:

https://github.com/protocolbuffers/protobuf/blob/v29.3/src/google/protobuf/generated_message_reflection.cc#L3668

With a debugger, I can see multiple calls to this check being ok for the built in proto schemas (cpp_features.proto, descriptor.proto) but the first time it's hit with my own proto schema, the check on FileDescriptor* is a nullptr so the code aborts.

As others have suggested, by switching to a version of protobuf compiled as shared libraries, the issue goes away. But I'm not sure that's the ideal solution for our use case.

imrichardcole avatar Feb 21 '25 15:02 imrichardcole

We triage inactive PRs and issues in order to make it easier to find active work. If this issue should remain active or becomes active again, please add a comment.

This issue is labeled inactive because the last activity was over 90 days ago. This issue will be closed and archived after 14 additional days without activity.

github-actions[bot] avatar Sep 20 '25 10:09 github-actions[bot]

Is this fixed? I have the same issue in wsl2(x86_64 machine, OS: unbutu 24.04). log:

WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
F0000 00:00:1758717661.998084   89306 generated_message_reflection.cc:3860] Check failed: file != nullptr 

the protobuf lib is using as a dir "protobuf-32.0" from cmake by:

set(protobuf_INSTALL ON CACHE BOOL "" FORCE)
set(protobuf_BUILD_TESTS OFF CACHE BOOL "")
set(protobuf_BUILD_CONFORMANCE OFF CACHE BOOL "")
set(protobuf_BUILD_EXAMPLES OFF CACHE BOOL "")

set(protobuf_BUILD_PROTOBUF_BINARIES ON CACHE BOOL "")
set(protobuf_BUILD_PROTOC_BINARIES OFF CACHE BOOL "")
set(protobuf_BUILD_LIBPROTOC OFF CACHE BOOL "")

set(protobuf_BUILD_SHARED_LIBS OFF CACHE BOOL "" FORCE)        # =========== Seems like it's happening here?

set(protobuf_FORCE_FETCH_DEPENDENCIES OFF CACHE BOOL "")
set(protobuf_LOCAL_DEPENDENCIES_ONLY ON CACHE BOOL "")
add_subdirectory(
    ${CMAKE_CURRENT_SOURCE_DIR}/protobuf-32.0
)

and I change the BUILD_SHARED to ON in wsl2: set(protobuf_BUILD_SHARED_LIBS ON CACHE BOOL "" FORCE) its good in wsl2 as it is...

But if it is set(protobuf_BUILD_SHARED_LIBS OFF CACHE BOOL "" FORCE) I tested on OS win, its running good by the way...

Staok avatar Sep 24 '25 13:09 Staok

We triage inactive PRs and issues in order to make it easier to find active work. If this issue should remain active or becomes active again, please add a comment.

This issue is labeled inactive because the last activity was over 90 days ago. This issue will be closed and archived after 14 additional days without activity.

github-actions[bot] avatar Dec 24 '25 10:12 github-actions[bot]

Can you provide a minimal repo replicating the issue for us to review?

PilgrimMemoirs avatar Jan 14 '26 18:01 PilgrimMemoirs