lingvo
lingvo copied to clipboard
Cannot run trainer.py with --model=car.waymo_deepfusion.DeepFusionCenterPointPed, undefined symbol: _ZNK10tensorflow8OpKernel11TraceStringERKNS_15OpKernelContextEb
Hi, thank you very much for your great work.
I am trying to run trainer.py with deepfusion model, but it outputs an error tensorflow.python.framework.errors_impl.NotFoundError: /usr/local/lib/python3.9/dist-packages/waymo_open_dataset/metrics/ops/metrics_ops.so: undefined symbol: _ZNK10tensorflow8OpKernel11TraceStringERKNS_15OpKernelContextEb
.
I use your dev.dockerfile
.
docker build --tag tensorflow:lingvo $(test "$LINGVO_DEVICE" = "gpu" && echo "--build-arg base_image=nvidia/cuda:11.6.1-cudnn8-runtime-ubuntu18.04") - < "$LINGVO_DIR/docker/dev.dockerfile"
docker run --rm $(test "$LINGVO_DEVICE" = "gpu" && echo "--gpus all") -it -v ${LINGVO_DIR}:/tmp/lingvo -v ${DATA_DIR}:/tmp/data -v ${HOME}/.gitconfig:/home/${USER}/.gitconfig:ro -p 6006:6006 -p 8888:8888 --name lingvo tensorflow:lingvo bash
In the container, I run:
bazel build -c opt --config=cuda //lingvo:trainer
bazel-bin/lingvo/trainer --logtostderr --model=car.waymo_deepfusion.DeepFusionCenterPointPed --mode=sync --logdir=/tmp/deepfusion --run_locally=gpu
The entire output messages:
2022-08-03 02:36:54.706801: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point roun
d-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
model_imports.py: Importing car.waymo_deepfusion
model_imports.py: Importing car.params.waymo_deepfusion
model_imports.py: Importing car.waymo_deepfusion.params
model_imports.py: Importing lingvo.tasks.car.waymo_deepfusion
model_imports.py: Importing lingvo.params.tasks.car.waymo_deepfusion
model_imports.py: Importing lingvo.tasks.params.car.waymo_deepfusion
model_imports.py: Importing lingvo.tasks.car.params.waymo_deepfusion
Traceback (most recent call last):
File "/tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/__main__/lingvo/trainer.py", line 863, in <module>
model_imports.ImportParams(FLAGS.model)
File "/tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/__main__/lingvo/model_imports.py", line 101, in ImportParams
success = _Import(module_with_params) or success
File "/tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/__main__/lingvo/model_imports.py", line 29, in _Import
importlib.import_module(name)
File "/usr/lib/python3.9/importlib/__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1030, in _gcd_import
File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
File "<frozen importlib._bootstrap>", line 986, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 680, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 850, in exec_module
File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
File "/tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/__main__/lingvo/tasks/car/params/waymo_deepfusion.py", line 32, in <module>
from lingvo.tasks.car.params import waymo as waymo_params
File "/tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/__main__/lingvo/tasks/car/params/waymo.py", line 28, in <module>
from lingvo.tasks.car.waymo import waymo_decoder
File "/tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/__main__/lingvo/tasks/car/waymo/waymo_decoder.py", line 23, in <module>
from lingvo.tasks.car.waymo import waymo_ap_metric
File "/tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/__main__/lingvo/tasks/car/waymo/waymo_ap_metric.py", line 32, in <module>
from waymo_open_dataset.metrics.ops import py_metrics_ops
File "/usr/local/lib/python3.9/dist-packages/waymo_open_dataset/metrics/ops/py_metrics_ops.py", line 23, in <module>
metrics_module = tf.load_op_library(
File "/usr/local/lib/python3.9/dist-packages/tensorflow/python/framework/load_library.py", line 54, in load_op_library
lib_handle = py_tf.TF_LoadLibrary(library_filename)
tensorflow.python.framework.errors_impl.NotFoundError: /usr/local/lib/python3.9/dist-packages/waymo_open_dataset/metrics/ops/metrics_ops.so: undefined symbol: _ZNK10tensorf
low8OpKernel11TraceStringERKNS_15OpKernelContextEb
I found a similar issue here, but bazel build
fails after downgrading tensorflow to tensorflow==2.6.0
.
Would you tell me why this happens?
Thank you in advance!
Same error occurs when I try --model=car.waymo.StarNetVehicle
, yet --model=car.kitti.StarNetCarModel0701
does not produce this error.
The output of bazel build
after downgrading tensorflow to tensorflow==2.6.0
.
Extracting Bazel installation...
Starting local Bazel server and connecting to it...
DEBUG: Rule 'subpar' indicated that a canonical reproducible form can be obtained by modifying arguments commit = "35bb9f0092f71ea56b742a520602da9b3638a24f", shallow_since
= "1557863961 -0400" and dropping ["tag"]
DEBUG: Repository subpar instantiated at:
/tmp/lingvo/WORKSPACE:12:15: in <toplevel>
Repository rule git_repository defined at:
/root/.cache/bazel/_bazel_root/17eb95f0bc03547f4f1319e61997e114/external/bazel_tools/tools/build_defs/repo/git.bzl:199:33: in <toplevel>
INFO: Analyzed target //lingvo:trainer (48 packages loaded, 6731 targets configured).
INFO: Found 1 target...
INFO: From Compiling icu4c/source/common/ucptrie.cpp:
external/icu/icu4c/source/common/ucptrie.cpp: In function 'UChar32 {anonymous}::getRange(const void*, UChar32, uint32_t (*)(const void*, uint32_t), const void*, uint32_t*)'
:
external/icu/icu4c/source/common/ucptrie.cpp:404:5: warning: 'value' may be used uninitialized in this function [-Wmaybe-uninitialized]
if (maybeFilterValue(highValue, trie->nullValue, nullValue,
^~
INFO: From Compiling icu4c/source/common/uidna.cpp:
external/icu/icu4c/source/common/uidna.cpp: In function 'int32_t _internal_toUnicode(const UChar*, int32_t, UChar*, int32_t, int32_t, UStringPrepProfile*, UParseError*, UEr
rorCode*)':
external/icu/icu4c/source/common/uidna.cpp:515:85: warning: 'int32_t uidna_toASCII_64(const UChar*, int32_t, UChar*, int32_t, int32_t, UParseError*, UErrorCode*)' is deprec
ated [-Wdeprecated-declarations]
b3Len = uidna_toASCII(b2, b2Len, b3, b3Capacity, options, parseError, status);
^
In file included from external/icu/icu4c/source/common/unicode/platform.h:25:0,
from external/icu/icu4c/source/common/unicode/ptypes.h:52,
from external/icu/icu4c/source/common/unicode/umachine.h:46,
from external/icu/icu4c/source/common/unicode/utypes.h:38,
from external/icu/icu4c/source/common/uidna.cpp:19:
external/icu/icu4c/source/common/unicode/uidna.h:576:1: note: declared here
uidna_toASCII(const UChar* src, int32_t srcLength,
^
external/icu/icu4c/source/common/uidna.cpp:528:80: warning: 'int32_t uidna_toASCII_64(const UChar*, int32_t, UChar*, int32_t, int32_t, UParseError*, UErrorCode*)' is deprec
ated [-Wdeprecated-declarations]
b3Len = uidna_toASCII(b2,b2Len,b3,b3Len,options,parseError, status);
^
In file included from external/icu/icu4c/source/common/unicode/platform.h:25:0,
from external/icu/icu4c/source/common/unicode/ptypes.h:52,
from external/icu/icu4c/source/common/unicode/umachine.h:46,
from external/icu/icu4c/source/common/unicode/utypes.h:38,
from external/icu/icu4c/source/common/uidna.cpp:19:
external/icu/icu4c/source/common/unicode/uidna.h:576:1: note: declared here
uidna_toASCII(const UChar* src, int32_t srcLength,
^
external/icu/icu4c/source/common/uidna.cpp: In function 'int32_t uidna_compare_64(const UChar*, int32_t, const UChar*, int32_t, int32_t, UErrorCode*)':
external/icu/icu4c/source/common/uidna.cpp:878:87: warning: 'int32_t uidna_IDNToASCII_64(const UChar*, int32_t, UChar*, int32_t, int32_t, UParseError*, UErrorCode*)' is dep
recated [-Wdeprecated-declarations]
b1Len = uidna_IDNToASCII(s1, length1, b1, b1Capacity, options, &parseError, status);
^
In file included from external/icu/icu4c/source/common/unicode/platform.h:25:0,
from external/icu/icu4c/source/common/unicode/ptypes.h:52,
from external/icu/icu4c/source/common/unicode/umachine.h:46,
from external/icu/icu4c/source/common/unicode/utypes.h:38,
from external/icu/icu4c/source/common/uidna.cpp:19:
external/icu/icu4c/source/common/uidna.cpp:670:1: note: declared here
uidna_IDNToASCII( const UChar *src, int32_t srcLength,
^
external/icu/icu4c/source/common/uidna.cpp:889:83: warning: 'int32_t uidna_IDNToASCII_64(const UChar*, int32_t, UChar*, int32_t, int32_t, UParseError*, UErrorCode*)' is dep
recated [-Wdeprecated-declarations]
b1Len = uidna_IDNToASCII(s1,length1,b1,b1Len, options, &parseError, status);
^
In file included from external/icu/icu4c/source/common/unicode/platform.h:25:0,
from external/icu/icu4c/source/common/unicode/ptypes.h:52,
from external/icu/icu4c/source/common/unicode/umachine.h:46,
from external/icu/icu4c/source/common/unicode/utypes.h:38,
from external/icu/icu4c/source/common/uidna.cpp:19:
external/icu/icu4c/source/common/uidna.cpp:670:1: note: declared here
uidna_IDNToASCII( const UChar *src, int32_t srcLength,
^
external/icu/icu4c/source/common/uidna.cpp:893:85: warning: 'int32_t uidna_IDNToASCII_64(const UChar*, int32_t, UChar*, int32_t, int32_t, UParseError*, UErrorCode*)' is dep
recated [-Wdeprecated-declarations]
b2Len = uidna_IDNToASCII(s2,length2, b2,b2Capacity, options, &parseError, status);
^
In file included from external/icu/icu4c/source/common/unicode/platform.h:25:0,
from external/icu/icu4c/source/common/unicode/ptypes.h:52,
from external/icu/icu4c/source/common/unicode/umachine.h:46,
from external/icu/icu4c/source/common/unicode/utypes.h:38,
from external/icu/icu4c/source/common/uidna.cpp:19:
external/icu/icu4c/source/common/uidna.cpp:670:1: note: declared here
uidna_IDNToASCII( const UChar *src, int32_t srcLength,
^
external/icu/icu4c/source/common/uidna.cpp:904:86: warning: 'int32_t uidna_IDNToASCII_64(const UChar*, int32_t, UChar*, int32_t, int32_t, UParseError*, UErrorCode*)' is dep
recated [-Wdeprecated-declarations]
b2Len = uidna_IDNToASCII(s2, length2, b2, b2Len, options, &parseError, status);
^
In file included from external/icu/icu4c/source/common/unicode/platform.h:25:0,
from external/icu/icu4c/source/common/unicode/ptypes.h:52,
from external/icu/icu4c/source/common/unicode/umachine.h:46,
from external/icu/icu4c/source/common/unicode/utypes.h:38,
from external/icu/icu4c/source/common/uidna.cpp:19:
external/icu/icu4c/source/common/uidna.cpp:670:1: note: declared here
uidna_IDNToASCII( const UChar *src, int32_t srcLength,
^
INFO: From Compiling icu4c/source/common/unistr.cpp:
external/icu/icu4c/source/common/unistr.cpp:1975:13: warning: 'void uprv_UnicodeStringDummy()' defined but not used [-Wunused-function]
static void uprv_UnicodeStringDummy(void) {
^~~~~~~~~~~~~~~~~~~~~~~
ERROR: /tmp/lingvo/lingvo/tools/BUILD:185:17: Linking lingvo/tools/generate_proto_def [for host] failed: (Exit 1): gcc failed: error executing command /usr/bin/gcc @bazel-o
ut/host/bin/lingvo/tools/generate_proto_def-2.params
Use --sandbox_debug to see verbose messages from the sandbox gcc failed: error executing command /usr/bin/gcc @bazel-out/host/bin/lingvo/tools/generate_proto_def-2.params
Use --sandbox_debug to see verbose messages from the sandbox
bazel-out/host/bin/lingvo/tools/_objs/generate_proto_def/generate_proto_def.o:generate_proto_def.cc:function (anonymous namespace)::WriteDotProto(google::protobuf::FileDesc
riptor const*, char const*): error: undefined reference to 'google::protobuf::FileDescriptor::DebugString[abi:cxx11]() const'
collect2: error: ld returned 1 exit status
Target //lingvo:trainer failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 82.266s, Critical Path: 20.57s
INFO: 226 processes: 19 internal, 207 processwrapper-sandbox.
FAILED: Build did NOT complete successfully
tensorflow is compiled with _GLIBCXX_USE_CXX11_ABI=1
from 2.9.0, and I believe this causes the problem.
https://github.com/tensorflow/tensorflow/releases/tag/v2.9.0
Tried below with tensorflow=2.7.3, but didn't work (build failed, same error as the previous comment ).
bazel build -c opt --copt=-D_GLIBCXX_USE_CXX11_ABI=0 --config=cuda //lingvo:trainer
I have finally fixed this by setting back _GLIBCXX_USE_CXX11_ABI
to 0
in several files and compile with tensorflow==2.7.3!
Let me summary this issue for my memorandum and for people who wanna use waymo-open-dataset.
Issue summary
Currently, the master branch of this repo is expected to have tensorflow==2.9.*.
, which can be assumed by the commit here. Note that tensorflow==2.9.*
is compiled with _GLIBCXX_USE_CXX11_ABI=1
(reference).
On the other hand, there are no waymo-open-dataset
prepared for tensorflow==2.9.*
, while they have one prepared for tensorflow==2.6.0
as their latest package. The latest version of waymo-open-dataset
is compiled with an old ABI (_GLIBCXX_USE_CXX11_ABI=0
, reference), so tensorflow==2.9.*
cannot load waymo-open-dataset
properly.
This blog helped me very much to understand the ABI problem.
How I solved
Set tensorflow< 2.9
and fix several files that declare _GLIBCXX_USE_CXX11_ABI=1
.
In Dockerfile, set tensorflow to 2.7.*. I confirmed the build succeeds with following versions.
tensorflow==2.7.3
tensorflow-datasets==4.6.0
tensorflow-estimator==2.7.0
tensorflow-hub==0.12.0
tensorflow-io-gcs-filesystem==0.26.0
tensorflow-metadata==1.9.0
tensorflow-probability==0.15.0
tensorflow-text==2.7.3
Then, following the commit, set back _GLIBCXX_USE_CXX11_ABI
to 0
in all the files.
Other possible solusions
Build tensorflow==2.9.*
with _GLIBCXX_USE_CXX11_ABI=0
.
See here to learn how to build tensorflow.
When you build it, give the flag --copt=-D_GLIBCXX_USE_CXX11_ABI=0
.
You need to set back _GLIBCXX_USE_CXX11_ABI
to 0
following the commit in this solution too.
Create waymo-open-dataset
for tensorflow==2.9.*
I believe this is the only way to use waymo-open-dataset
with tensorflow==2.9.*
, but I could not find out how to do it. You need to prepare tf/workspace_tf2_9_*.bzl
with proper library versions.
For those who want to use waymo-open-dataset
in current master branch
I would like to close this issue, but please be aware that the trainer.py
in master branch cannot be built with the given dev.dockerfile for now.