deepmind-research
deepmind-research copied to clipboard
BYOL setup error: undefined symbol: _ZNK6google8protobuf7Message11GetTypeNameEv
Help! I followed the setup steps of byol README exactly, but as soon as I run python -m byol.main_loop \ --experiment_mode='pretrain' \ --worker_mode='train' \ --checkpoint_root='/tmp/byol_checkpoints' \ --pretrain_epochs=40
or python -m byol.main_loop_test
I got the following error:
Traceback (most recent call last):
File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/content/byol/main_loop_test.py", line 21, in <module>
from byol import byol_experiment
File "/content/byol/byol_experiment.py", line 25, in <module>
from acme.jax import utils as acme_utils
File "/usr/local/lib/python3.7/dist-packages/acme/__init__.py", line 35, in <module>
from acme.environment_loop import EnvironmentLoop
File "/usr/local/lib/python3.7/dist-packages/acme/environment_loop.py", line 26, in <module>
from acme.utils import signals
File "/usr/local/lib/python3.7/dist-packages/acme/utils/signals.py", line 22, in <module>
import launchpad
File "/usr/local/lib/python3.7/dist-packages/launchpad/__init__.py", line 36, in <module>
from launchpad.nodes.courier.node import CourierHandle
File "/usr/local/lib/python3.7/dist-packages/launchpad/nodes/courier/node.py", line 21, in <module>
import courier
File "/usr/local/lib/python3.7/dist-packages/courier/__init__.py", line 26, in <module>
from courier.python.client import Client # pytype: disable=import-error
File "/usr/local/lib/python3.7/dist-packages/courier/python/client.py", line 30, in <module>
from courier.python import py_client
ImportError: /usr/local/lib/python3.7/dist-packages/courier/python/libserialization_cc_proto.so: undefined symbol: _ZNK6google8protobuf7Message11GetTypeNameEv
I have tried many different python version and OS:
- Python 3.9, 3.8, 3.7, 3.6
- Linux Ubuntu 20.04 LTS, Ubuntu 18.04.6 LTS (Google Colab), windows10, Jetson Nano Image jp461 (linux OS nvidia cuda embedded)
There is no any error with the installation process, but as long as the execution of that step (python -m byol.main_loop --...
) is to report an error (...undefined symbol: _ZNK6google8protobuf7Message11GetTypeNameEv
).
I searched the internet and it seems to be related to protobuf. I tried to update the version or reinstall another version, but because of the dependency of the package, I ended up installing only protobuf version == 3.19.6 (default), protobuf version == 3.19.5
I also tried to compile jax, jaxlib, protobuf by myself (download the source code from their github), but it still didn't work.
I have also tried the following GPU hardware:
- Nvidia RTX 3090
- Nvidia GTX 1080Ti
- Nvidia GTX 1070
- Nvidia Jetson Nano 4GB
But all of them are not working. Can anyone please provide me with the software and hardware environment settings that I can be sure will work?
Were you able to find a solution to this? I am facing the exact same issue.
Is there any update on this?
I was facing a similar issue, with Launchpad.
...
File "/home/callum/miniconda3/envs/protobuf_issue_test/lib/python3.9/site-packages/launchpad/nodes/courier/node.py", line 21, in <module>
import courier
File "/home/callum/miniconda3/envs/protobuf_issue_test/lib/python3.9/site-packages/courier/__init__.py", line 26, in <module>
from courier.python.client import Client # pytype: disable=import-error
File "/home/callum/miniconda3/envs/protobuf_issue_test/lib/python3.9/site-packages/courier/python/client.py", line 30, in <module>
from courier.python import py_client
ImportError: /home/callum/miniconda3/envs/protobuf_issue_test/lib/python3.9/site-packages/courier/python/libserialization_cc_proto.so: undefined symbol: _ZNK6google8protobuf7Message11GetTypeNameEv
As you can see, there is the same missing symbol: _ZNK6google8protobuf7Message11GetTypeNameEv
.
I used unix's library dependency checker on the cited shared object file: (ldd /home/callum/miniconda3/envs/protobuf_issue_test/lib/python3.9/site-packages/courier/python/libserialization_cc_proto.so
), as suggested in https://github.com/NVIDIA-AI-IOT/torch2trt/issues/53, which yields the following:
...
libtensorflow_framework.so.2 => not found
...
I first checked that I had tensorflow installed:
(protobuf_issue_test) ➜ callum-tilbury ✗ pip show tensorflow
Name: tensorflow
Version: 2.11.0
Summary: TensorFlow is an open source machine learning framework for everyone.
...
and did similar things for associated TF packages, tensorflow-datasets
, etc. But they were all there already.
I then progressively downgraded tensorflow, eventually to v2.8, pip install tensorflow~=2.8.0
, and it fixed the issue. Interestingly, ldd
still shows libtensorflow_framework.so.2
as not found.
This approach isn't exactly a solution, and feels more like a hack. But perhaps it's a step in understanding why things are breaking. Hopefully it helps you :)