[cmake] v27.* break the static library support compare to v26.1
For google/or-tools we tried to upgrade from v26.1 (working) to v27.2 (broken)
TLDR: in cmake you have to target_link_library() to protobuf in add_executable() and add_library(), if Protobuf has been built as STATIC this will lead to an ODR violation. Until v26.1 it was still working (some underlying map was still unique, starting v27.0 we got two internal map thus the error message below)
note: release v27.0 and v27.1 are also broken...
Language: Python (native C++ module) i.e. pybind11 wrappers....
What operating system (Linux, Windows, ...) and version? Linux and MacOS are impacted...
What runtime / compiler are you using (e.g., python version or gcc version) From Python 3.9 to Python 3.12 gcc ~[10,14] see default version from almalinux:latest, rockylinux:9, fedora:latest, debian:latest, opensuse, ubuntu:latest .... ref: https://repology.org/project/gcc/badges. note: we are targeting C++17 (with agregate initialisation) or C++20 (MSVC)
What did you do? Please read this issue for a full detailed bug report:
- https://github.com/google/or-tools/issues/4307
What did you expect to see all or-tools tests pass as before...
What did you see instead? all python tests using pybind11 and protobuf failed with error:
F0000 00:00:1720999376.464679 2304 generated_message_reflection.cc:3620] Check failed: file != nullptr
Make sure you include information that can help us debug (full error message, exception listing, stack trace, logs). Again, Please read the issue for a full detailed bug report:
- https://github.com/google/or-tools/issues/4307
Anything else we should know about your project / environment
using kind of bissect on protobuf tag use in our build, I figure out it is these commits (split in two parts -_-) are at fault:
- protocolbuffers/protobuf@49c83ab contains auto generated code after the commit abc9bae
- protocolbuffers/protobuf@abc9bae so I guess this one is not atomic and can't be tested (compilation error)...
| Protobuf Version | Test | Date |
|---|---|---|
| v27.0 | FAIL | 2024/05/23 |
| ... | ... | ... |
| 49c83ab799977592744f50df93957ddec1e12e70 | FAIL | 2024/01/30 (newer) |
| abc9bae7c19d9fff4b0be0ffd3a01f947d3b487a | NA(0) | 2024/01/30 (need @49c83ab) |
| 58baeb4c3b664f8918d24cef5151083d9da9767c | PASS | 2024/01/30 (older) |
| ... | ... | ... |
| v26.1 | PASS | 2024/03/27 |
- Can't test abc9bae directly since it miss the next commit which generate some stuff...
For Googlers: it is cl/602725967
note: git revert both commit on top of v27.2 tag was not working (too many conflict to make it...)
MRE: bazel-proto
Some update: This weekend I was able to reproduce it with a pet project without involving pybind11 etc... EDIT: same issue with "v28.0-rc1" too
It seems the issue comes from the shared library nature...
here the project layout:
graph TD;
A[bin/App<br>binary]-->FB[lib/libFooBar.so<br>library];
FB-->F[lib/libFoo.so<br>library];
FB-->B[lib/libBar.so<br>library];
F-.->Pc[C.proto];
Pc-- "import" -->Pa[A.proto];
Pc-- "import" -->Pb[B.proto];
ref: https://github.com/mizux/bazel-proto
Trace:
% cmake -S. -Bbuild
% cmake --build build -j
% ./build/bin/App
...
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
F0000 00:00:1721036258.637555 1559662 generated_message_reflection.cc:3619] Check failed: file != nullptr
*** Check failure stack trace: ***
zsh: IOT instruction ./build/bin/App
Workaround
if I apply this patch to make SHARED libraries to be STATIC:
diff --git a/foo/CMakeLists.txt b/foo/CMakeLists.txt
index aee9266..c7476a1 100644
--- a/foo/CMakeLists.txt
+++ b/foo/CMakeLists.txt
@@ -1,4 +1,4 @@
-add_library(Foo) # shared implicit
+add_library(Foo STATIC "")
get_cpp_proto(PROTO_HDRS PROTO_SRCS)
target_sources(Foo
diff --git a/foobar/CMakeLists.txt b/foobar/CMakeLists.txt
index 724f2e1..d4fe85a 100644
--- a/foobar/CMakeLists.txt
+++ b/foobar/CMakeLists.txt
@@ -1,4 +1,4 @@
-add_library(FooBar "") # shared implicit
+add_library(FooBar STATIC "")
target_sources(FooBar PRIVATE foobar.h foobar.cc)
target_include_directories(FooBar
PUBLIC
% cmake --build build -j
% ./build/bin/App
...
(exit 0)
i.e. App embedded the libFoo.a then test pass, aka bin/App run without error...
So I suspect the Protobuf commit to break the shared library mode.
Back to google/or-tools when generating our wheel package we have the following layout:
ortools
├── __init__.py
├── .libs
│ └── libortools.so.10 (48Mo)
├── algorithms
│ ├── __init__.py
│ └── python
│ ├── __init__.py
│ ├── knapsack_solver.cpython-312-x86_64-linux-gnu.so
│ └── knapsack_solver.pyi
├── graph
│ ├── __init__.py
│ └── python
│ ├── __init__.py
│ ├── max_flow.cpython-312-x86_64-linux-gnu.so
│ ├── max_flow.pyi
│ ├── min_cost_flow.cpython-312-x86_64-linux-gnu.so
│ ├── min_cost_flow.pyi
│ ├── linear_sum_assignment.cpython-312-x86_64-linux-gnu.so
│ └── linear_sum_assignment.pyi
├── routing
│ ├── enums_pb2.py
│ ├── enums_pb2.pyi
│ ├── ils_pb2.py
│ ├── ils_pb2.pyi
│ ├── __init__.py
│ └── python
│ ├── __init__.py
│ ├── model.cpython-312-x86_64-linux-gnu.so (2Mo)
│ ├── model.pyi
├── sat
...
graph TD;
D_O[ortools];
D_O-->D_L[.libs];
D_L-->OR[(libortools.so<br>ortools/.lib)];
D_O-->D_A[algorithms];
D_A-->D_AP[python];
D_AP-->A[(algorithms.cpython-312-x86_64-linux-gnu.so<br>ortools/algorithms/python)];
A-. link .->OR;
D_O-->D_G[graph];
D_G-->D_GP[python];
D_GP-->G1[(max_flow.cpython-312-x86_64-linux-gnu.so<br>ortools/graph/python)];
D_GP-->G2[(min_cost_flow.cpython-312-x86_64-linux-gnu.so<br>ortools/graph/python)];
G1-. link .->OR;
G2-. link .->OR;
D_O-->D_R[routing];
D_R-->D_RP[python];
D_RP-->R[(model.cpython-312-x86_64-linux-gnu.so<br>ortools/routing/python)];
R-. link .->OR;
D_O-->D_X[component];
D_X-->D_XP[python];
D_XP-->X[(component.cpython-312-x86_64-linux-gnu.so<br>ortools/<component>/python)];
X-. link .->OR;
note: we are using RUNPATH for each *.cpython.so lib to find the libortools.so.
objdump -p ortools/algorithms/python/knapsack_solver.cpython-312-x86_64-linux-gnu.so | grep RUNPATH
RUNPATH $ORIGIN:$ORIGIN/../../../ortools/.libs
using a shared lib libortools.so avoid to have to duplicate symbols in each python "sub" module. note: libortools.so it's 48Mo, since we have a dozen of python modules so 48Mo * 12 is a no go (pypi limit is 200Mo per wheel packages...)
Dev Note playing with http://github.com/mizux/bazel-proto and using a hackish version of protobuf 58baeb4 vs 49c83ab
Adding patch:
diff --git a/src/google/protobuf/generated_message_reflection.cc b/src/google/protobuf/generated_message_reflection.cc
index d0543d134..450cf589d 100644
--- a/src/google/protobuf/generated_message_reflection.cc
+++ b/src/google/protobuf/generated_message_reflection.cc
@@ -3642,6 +3642,7 @@ void AssignDescriptorsImpl(const DescriptorTable* table, bool eager) {
}
// Fill the arrays with pointers to descriptors and reflection classes.
+ std::cerr << "filename: '" << table->filename << "'\n";
const FileDescriptor* file =
DescriptorPool::internal_generated_pool()->FindFileByName(
table->filename);
Using 58baeb4 :
./build/bin/App
...
"FooBar":{"Bar":{"int":13,"int64":31},"Foo":{"int":17,"int64":42}}
filename: 'google/protobuf/descriptor.proto'
filename: 'google/protobuf/cpp_features.proto'
filename: 'foo/C.proto'
filename: 'foo/A.proto'
filename: 'foo/B.proto'
a {
name: "42"
}
b {
value: 42
}
Using 49c83ab :
./build/bin/App
...
"FooBar":{"Bar":{"int":13,"int64":31},"Foo":{"int":17,"int64":42}}
filename: 'google/protobuf/descriptor.proto'
filename: 'google/protobuf/cpp_features.proto'
filename: 'foo/C.proto'
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
F0000 00:00:1721046248.069280 1828260 generated_message_reflection.cc:3623] Check failed: file != nullptr
*** Check failure stack trace: ***
zsh: IOT instruction ./build/bin/App
Small update, buiding Protobuf in SHARED make the bazel-proto tests (e.g. foo_test) pass...
diff --git a/cmake/dependencies/CMakeLists.txt b/cmake/dependencies/CMakeLists.txt
index 7bf92be..3b712ba 100644
--- a/cmake/dependencies/CMakeLists.txt
+++ b/cmake/dependencies/CMakeLists.txt
@@ -52,13 +52,15 @@ if(BUILD_Protobuf)
message(CHECK_START "Fetching Protobuf")
list(APPEND CMAKE_MESSAGE_INDENT " ")
set(protobuf_BUILD_TESTS OFF)
- set(protobuf_BUILD_SHARED_LIBS OFF)
+ set(protobuf_BUILD_SHARED_LIBS ON)
set(protobuf_BUILD_EXPORT OFF)
set(protobuf_MSVC_STATIC_RUNTIME OFF)
#set(protobuf_BUILD_LIBUPB ON)
i.e.
BAD: bin/foo_test -> lib/libFoo.so --static link--> libprotobuf.a
GOOD: bin/foo_test -> lib/libFoo.so --ldd--> libprotobuf.so
Adding some log (thx to @sbenzaquen for the suggestion). Seems we have an ODR violation when using a static version of Protobuf
The patch in Protobuf to have some insight...
git diff HEAD~1
diff --git a/src/google/protobuf/generated_message_util.cc b/src/google/protobuf/generated_message_util.cc
index e33d22fc4..7b9663d43 100644
--- a/src/google/protobuf/generated_message_util.cc
+++ b/src/google/protobuf/generated_message_util.cc
@@ -88,18 +88,23 @@ void InitWeakDefaults() {}
PROTOBUF_CONSTINIT std::atomic<bool> init_protobuf_defaults_state{false};
static bool InitProtobufDefaultsImpl() {
+ std::cerr << __func__ << ":" << __LINE__ << " call...\n";
fixed_address_empty_string.DefaultConstruct();
OnShutdownDestroyString(fixed_address_empty_string.get_mutable());
InitWeakDefaults();
init_protobuf_defaults_state.store(true, std::memory_order_release);
+ std::cerr << __func__ << ":" << __LINE__ << " done.\n";
return true;
}
void InitProtobufDefaultsSlow() {
+ std::cerr << __func__ << ":" << __LINE__ << " call...\n";
static bool is_inited = InitProtobufDefaultsImpl();
(void)is_inited;
+ std::cerr << "is_inited: " << &is_inited << ".\n";
+ std::cerr << __func__ << ":" << __LINE__ << "done.\n";
}
// Force the initialization of the empty string.
// Normally, registration would do it, but we don't have any guarantee that
The trace of build/bin/foo_test when using libprotobuf.a patched and with the commit @49c83ab
./build/bin/foo_test
InitProtobufDefaultsSlow:103 call...
InitProtobufDefaultsImpl:91 call...
InitProtobufDefaultsImpl:98 done.
is_inited: 0x7ff270270e08.
InitProtobufDefaultsSlow:107done.
InitProtobufDefaultsSlow:103 call...
InitProtobufDefaultsImpl:91 call...
InitProtobufDefaultsImpl:98 done.
is_inited: 0x5641f8051b48.
InitProtobufDefaultsSlow:107done.
Running main() from /usr/local/google/home/corentinl/dev/repo/bazel-proto/build/_deps/googletest-src/googletest/src/gtest_main.cc
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from FooTest
[ RUN ] FooTest.ProtoFunction
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
F0000 00:00:1721369740.493581 753437 generated_message_reflection.cc:3622] Check failed: file != nullptr
*** Check failure stack trace: ***
zsh: IOT instruction ./build/bin/foo_test
note: same ODR violation prior to the patch (i.e. protobuf v26.1) but without the assert fail side effect thus it was unnoticed until now...
@sbenzaquen can you take a look to see if we can provide any more insight?
Update: using BUILD_SHARED_LIBS for abseil-cpp and Protobuf we manage to fix the ODR violation.
So AFAIK Protobuf cmake based build do not support STATIC library build and should not advertise it i.e. cmake should detect if Protobuf will be built in static and issue a warning/error or force SHARED build only IMHO...
ref: https://cmake.org/cmake/help/latest/variable/BUILD_SHARED_LIBS.html
This one really helps me out, I can reproduce this on grpc 1.68.x built from source with "protobuf_BUILD_SHARED_LIBS=OFF" and can set to "ON" to avoid it...
I believe we're seeing this issue in v29.3 also. You can reproduce it by compiling protobuf as a static library using MSVC, then any trivial .proto schema will trigger the error.
It's coming from:
https://github.com/protocolbuffers/protobuf/blob/v29.3/src/google/protobuf/generated_message_reflection.cc#L3668
With a debugger, I can see multiple calls to this check being ok for the built in proto schemas (cpp_features.proto, descriptor.proto) but the first time it's hit with my own proto schema, the check on FileDescriptor* is a nullptr so the code aborts.
As others have suggested, by switching to a version of protobuf compiled as shared libraries, the issue goes away. But I'm not sure that's the ideal solution for our use case.
We triage inactive PRs and issues in order to make it easier to find active work. If this issue should remain active or becomes active again, please add a comment.
This issue is labeled inactive because the last activity was over 90 days ago. This issue will be closed and archived after 14 additional days without activity.
Is this fixed? I have the same issue in wsl2(x86_64 machine, OS: unbutu 24.04). log:
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
F0000 00:00:1758717661.998084 89306 generated_message_reflection.cc:3860] Check failed: file != nullptr
the protobuf lib is using as a dir "protobuf-32.0" from cmake by:
set(protobuf_INSTALL ON CACHE BOOL "" FORCE)
set(protobuf_BUILD_TESTS OFF CACHE BOOL "")
set(protobuf_BUILD_CONFORMANCE OFF CACHE BOOL "")
set(protobuf_BUILD_EXAMPLES OFF CACHE BOOL "")
set(protobuf_BUILD_PROTOBUF_BINARIES ON CACHE BOOL "")
set(protobuf_BUILD_PROTOC_BINARIES OFF CACHE BOOL "")
set(protobuf_BUILD_LIBPROTOC OFF CACHE BOOL "")
set(protobuf_BUILD_SHARED_LIBS OFF CACHE BOOL "" FORCE) # =========== Seems like it's happening here?
set(protobuf_FORCE_FETCH_DEPENDENCIES OFF CACHE BOOL "")
set(protobuf_LOCAL_DEPENDENCIES_ONLY ON CACHE BOOL "")
add_subdirectory(
${CMAKE_CURRENT_SOURCE_DIR}/protobuf-32.0
)
and I change the BUILD_SHARED to ON in wsl2:
set(protobuf_BUILD_SHARED_LIBS ON CACHE BOOL "" FORCE)
its good in wsl2 as it is...
But if it is set(protobuf_BUILD_SHARED_LIBS OFF CACHE BOOL "" FORCE) I tested on OS win, its running good by the way...
We triage inactive PRs and issues in order to make it easier to find active work. If this issue should remain active or becomes active again, please add a comment.
This issue is labeled inactive because the last activity was over 90 days ago. This issue will be closed and archived after 14 additional days without activity.
Can you provide a minimal repo replicating the issue for us to review?