lingvo icon indicating copy to clipboard operation
lingvo copied to clipboard

Cannot run trainer.py with --model=car.waymo_deepfusion.DeepFusionCenterPointPed, undefined symbol: _ZNK10tensorflow8OpKernel11TraceStringERKNS_15OpKernelContextEb

Open Rtakaha opened this issue 2 years ago • 3 comments

Hi, thank you very much for your great work.

I am trying to run trainer.py with deepfusion model, but it outputs an error tensorflow.python.framework.errors_impl.NotFoundError: /usr/local/lib/python3.9/dist-packages/waymo_open_dataset/metrics/ops/metrics_ops.so: undefined symbol: _ZNK10tensorflow8OpKernel11TraceStringERKNS_15OpKernelContextEb.

I use your dev.dockerfile.

docker build --tag tensorflow:lingvo $(test "$LINGVO_DEVICE" = "gpu" && echo "--build-arg base_image=nvidia/cuda:11.6.1-cudnn8-runtime-ubuntu18.04") - < "$LINGVO_DIR/docker/dev.dockerfile"
docker run --rm $(test "$LINGVO_DEVICE" = "gpu" && echo "--gpus all") -it -v ${LINGVO_DIR}:/tmp/lingvo -v ${DATA_DIR}:/tmp/data -v ${HOME}/.gitconfig:/home/${USER}/.gitconfig:ro -p 6006:6006 -p 8888:8888 --name lingvo tensorflow:lingvo bash

In the container, I run:

bazel build -c opt --config=cuda //lingvo:trainer
bazel-bin/lingvo/trainer --logtostderr --model=car.waymo_deepfusion.DeepFusionCenterPointPed --mode=sync --logdir=/tmp/deepfusion --run_locally=gpu

The entire output messages:

2022-08-03 02:36:54.706801: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point roun
d-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.                                                   
model_imports.py: Importing car.waymo_deepfusion                                                                                                                            
model_imports.py: Importing car.params.waymo_deepfusion                                                                                                                     
model_imports.py: Importing car.waymo_deepfusion.params                                                                                                                     
model_imports.py: Importing lingvo.tasks.car.waymo_deepfusion                                                                                                               
model_imports.py: Importing lingvo.params.tasks.car.waymo_deepfusion                                                                                                        
model_imports.py: Importing lingvo.tasks.params.car.waymo_deepfusion
model_imports.py: Importing lingvo.tasks.car.params.waymo_deepfusion
Traceback (most recent call last):
  File "/tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/__main__/lingvo/trainer.py", line 863, in <module>
    model_imports.ImportParams(FLAGS.model) 
  File "/tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/__main__/lingvo/model_imports.py", line 101, in ImportParams
    success = _Import(module_with_params) or success
  File "/tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/__main__/lingvo/model_imports.py", line 29, in _Import
    importlib.import_module(name)
  File "/usr/lib/python3.9/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1030, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
  File "<frozen importlib._bootstrap>", line 986, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 680, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 850, in exec_module
  File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
  File "/tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/__main__/lingvo/tasks/car/params/waymo_deepfusion.py", line 32, in <module>
    from lingvo.tasks.car.params import waymo as waymo_params
  File "/tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/__main__/lingvo/tasks/car/params/waymo.py", line 28, in <module>
    from lingvo.tasks.car.waymo import waymo_decoder
  File "/tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/__main__/lingvo/tasks/car/waymo/waymo_decoder.py", line 23, in <module>
    from lingvo.tasks.car.waymo import waymo_ap_metric
  File "/tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/__main__/lingvo/tasks/car/waymo/waymo_ap_metric.py", line 32, in <module>
    from waymo_open_dataset.metrics.ops import py_metrics_ops
  File "/usr/local/lib/python3.9/dist-packages/waymo_open_dataset/metrics/ops/py_metrics_ops.py", line 23, in <module>
    metrics_module = tf.load_op_library(
  File "/usr/local/lib/python3.9/dist-packages/tensorflow/python/framework/load_library.py", line 54, in load_op_library
    lib_handle = py_tf.TF_LoadLibrary(library_filename)
tensorflow.python.framework.errors_impl.NotFoundError: /usr/local/lib/python3.9/dist-packages/waymo_open_dataset/metrics/ops/metrics_ops.so: undefined symbol: _ZNK10tensorf
low8OpKernel11TraceStringERKNS_15OpKernelContextEb

I found a similar issue here, but bazel build fails after downgrading tensorflow to tensorflow==2.6.0.

Would you tell me why this happens?

Thank you in advance!

Rtakaha avatar Aug 03 '22 02:08 Rtakaha

Same error occurs when I try --model=car.waymo.StarNetVehicle, yet --model=car.kitti.StarNetCarModel0701 does not produce this error.

Rtakaha avatar Aug 03 '22 02:08 Rtakaha

The output of bazel build after downgrading tensorflow to tensorflow==2.6.0.

Extracting Bazel installation...                                                                                                                                            
Starting local Bazel server and connecting to it...                                                                                                                         
DEBUG: Rule 'subpar' indicated that a canonical reproducible form can be obtained by modifying arguments commit = "35bb9f0092f71ea56b742a520602da9b3638a24f", shallow_since 
= "1557863961 -0400" and dropping ["tag"]                                                                                                                                   
DEBUG: Repository subpar instantiated at:                                                                                                                                   
  /tmp/lingvo/WORKSPACE:12:15: in <toplevel>                                                                                                                                
Repository rule git_repository defined at:                                                                                                                                  
  /root/.cache/bazel/_bazel_root/17eb95f0bc03547f4f1319e61997e114/external/bazel_tools/tools/build_defs/repo/git.bzl:199:33: in <toplevel>                                  
INFO: Analyzed target //lingvo:trainer (48 packages loaded, 6731 targets configured).                                                                                       
INFO: Found 1 target...                                                                                                                                                     
INFO: From Compiling icu4c/source/common/ucptrie.cpp:                                                                                                                       
external/icu/icu4c/source/common/ucptrie.cpp: In function 'UChar32 {anonymous}::getRange(const void*, UChar32, uint32_t (*)(const void*, uint32_t), const void*, uint32_t*)'
:                                                                                                                                                                           
external/icu/icu4c/source/common/ucptrie.cpp:404:5: warning: 'value' may be used uninitialized in this function [-Wmaybe-uninitialized]                                     
     if (maybeFilterValue(highValue, trie->nullValue, nullValue,                                                                                                            
     ^~                                                                                                                                                                     
INFO: From Compiling icu4c/source/common/uidna.cpp:                                                                                                                         
external/icu/icu4c/source/common/uidna.cpp: In function 'int32_t _internal_toUnicode(const UChar*, int32_t, UChar*, int32_t, int32_t, UStringPrepProfile*, UParseError*, UEr
rorCode*)':                                                                                                                                                                 
external/icu/icu4c/source/common/uidna.cpp:515:85: warning: 'int32_t uidna_toASCII_64(const UChar*, int32_t, UChar*, int32_t, int32_t, UParseError*, UErrorCode*)' is deprec
ated [-Wdeprecated-declarations]                                                                                                                                            
         b3Len = uidna_toASCII(b2, b2Len, b3, b3Capacity, options, parseError, status);                                                                                     
                                                                                     ^                                                                                      
In file included from external/icu/icu4c/source/common/unicode/platform.h:25:0,                                                                                             
                 from external/icu/icu4c/source/common/unicode/ptypes.h:52,                                                                                                 
                 from external/icu/icu4c/source/common/unicode/umachine.h:46,                                                                                               
                 from external/icu/icu4c/source/common/unicode/utypes.h:38,                                                                                                 
                 from external/icu/icu4c/source/common/uidna.cpp:19:                                                                                                        
external/icu/icu4c/source/common/unicode/uidna.h:576:1: note: declared here                                                                                                 
 uidna_toASCII(const UChar* src, int32_t srcLength,                                                                                                                         
 ^                                                                                                                                                                          
external/icu/icu4c/source/common/uidna.cpp:528:80: warning: 'int32_t uidna_toASCII_64(const UChar*, int32_t, UChar*, int32_t, int32_t, UParseError*, UErrorCode*)' is deprec
ated [-Wdeprecated-declarations]
             b3Len =  uidna_toASCII(b2,b2Len,b3,b3Len,options,parseError, status);
                                                                                ^
In file included from external/icu/icu4c/source/common/unicode/platform.h:25:0,
                 from external/icu/icu4c/source/common/unicode/ptypes.h:52,
                 from external/icu/icu4c/source/common/unicode/umachine.h:46,
                 from external/icu/icu4c/source/common/unicode/utypes.h:38,
                 from external/icu/icu4c/source/common/uidna.cpp:19:
external/icu/icu4c/source/common/unicode/uidna.h:576:1: note: declared here
 uidna_toASCII(const UChar* src, int32_t srcLength,
 ^
external/icu/icu4c/source/common/uidna.cpp: In function 'int32_t uidna_compare_64(const UChar*, int32_t, const UChar*, int32_t, int32_t, UErrorCode*)':
external/icu/icu4c/source/common/uidna.cpp:878:87: warning: 'int32_t uidna_IDNToASCII_64(const UChar*, int32_t, UChar*, int32_t, int32_t, UParseError*, UErrorCode*)' is dep
recated [-Wdeprecated-declarations]
     b1Len = uidna_IDNToASCII(s1, length1, b1, b1Capacity, options, &parseError, status);
                                                                                       ^
In file included from external/icu/icu4c/source/common/unicode/platform.h:25:0,
                 from external/icu/icu4c/source/common/unicode/ptypes.h:52,
                 from external/icu/icu4c/source/common/unicode/umachine.h:46,
                 from external/icu/icu4c/source/common/unicode/utypes.h:38,
                 from external/icu/icu4c/source/common/uidna.cpp:19:
external/icu/icu4c/source/common/uidna.cpp:670:1: note: declared here
 uidna_IDNToASCII(  const UChar *src, int32_t srcLength,
 ^
external/icu/icu4c/source/common/uidna.cpp:889:83: warning: 'int32_t uidna_IDNToASCII_64(const UChar*, int32_t, UChar*, int32_t, int32_t, UParseError*, UErrorCode*)' is dep
recated [-Wdeprecated-declarations]
         b1Len = uidna_IDNToASCII(s1,length1,b1,b1Len, options, &parseError, status);
                                                                                   ^
In file included from external/icu/icu4c/source/common/unicode/platform.h:25:0,
                 from external/icu/icu4c/source/common/unicode/ptypes.h:52,
                 from external/icu/icu4c/source/common/unicode/umachine.h:46,
                 from external/icu/icu4c/source/common/unicode/utypes.h:38,
                 from external/icu/icu4c/source/common/uidna.cpp:19:
external/icu/icu4c/source/common/uidna.cpp:670:1: note: declared here
 uidna_IDNToASCII(  const UChar *src, int32_t srcLength,
 ^
external/icu/icu4c/source/common/uidna.cpp:893:85: warning: 'int32_t uidna_IDNToASCII_64(const UChar*, int32_t, UChar*, int32_t, int32_t, UParseError*, UErrorCode*)' is dep
recated [-Wdeprecated-declarations]
     b2Len = uidna_IDNToASCII(s2,length2, b2,b2Capacity, options, &parseError, status);
                                                                                     ^
In file included from external/icu/icu4c/source/common/unicode/platform.h:25:0,
                 from external/icu/icu4c/source/common/unicode/ptypes.h:52,
                 from external/icu/icu4c/source/common/unicode/umachine.h:46,
                 from external/icu/icu4c/source/common/unicode/utypes.h:38,
                 from external/icu/icu4c/source/common/uidna.cpp:19:
external/icu/icu4c/source/common/uidna.cpp:670:1: note: declared here
 uidna_IDNToASCII(  const UChar *src, int32_t srcLength,
 ^
external/icu/icu4c/source/common/uidna.cpp:904:86: warning: 'int32_t uidna_IDNToASCII_64(const UChar*, int32_t, UChar*, int32_t, int32_t, UParseError*, UErrorCode*)' is dep
recated [-Wdeprecated-declarations]
         b2Len = uidna_IDNToASCII(s2, length2, b2, b2Len, options, &parseError, status);
                                                                                      ^
In file included from external/icu/icu4c/source/common/unicode/platform.h:25:0,
                 from external/icu/icu4c/source/common/unicode/ptypes.h:52,
                 from external/icu/icu4c/source/common/unicode/umachine.h:46,
                 from external/icu/icu4c/source/common/unicode/utypes.h:38,
                 from external/icu/icu4c/source/common/uidna.cpp:19:
external/icu/icu4c/source/common/uidna.cpp:670:1: note: declared here
 uidna_IDNToASCII(  const UChar *src, int32_t srcLength,
 ^
INFO: From Compiling icu4c/source/common/unistr.cpp:
external/icu/icu4c/source/common/unistr.cpp:1975:13: warning: 'void uprv_UnicodeStringDummy()' defined but not used [-Wunused-function]
 static void uprv_UnicodeStringDummy(void) {
             ^~~~~~~~~~~~~~~~~~~~~~~
ERROR: /tmp/lingvo/lingvo/tools/BUILD:185:17: Linking lingvo/tools/generate_proto_def [for host] failed: (Exit 1): gcc failed: error executing command /usr/bin/gcc @bazel-o
ut/host/bin/lingvo/tools/generate_proto_def-2.params

Use --sandbox_debug to see verbose messages from the sandbox gcc failed: error executing command /usr/bin/gcc @bazel-out/host/bin/lingvo/tools/generate_proto_def-2.params

Use --sandbox_debug to see verbose messages from the sandbox
bazel-out/host/bin/lingvo/tools/_objs/generate_proto_def/generate_proto_def.o:generate_proto_def.cc:function (anonymous namespace)::WriteDotProto(google::protobuf::FileDesc
riptor const*, char const*): error: undefined reference to 'google::protobuf::FileDescriptor::DebugString[abi:cxx11]() const'
collect2: error: ld returned 1 exit status
Target //lingvo:trainer failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 82.266s, Critical Path: 20.57s
INFO: 226 processes: 19 internal, 207 processwrapper-sandbox.
FAILED: Build did NOT complete successfully

Rtakaha avatar Aug 03 '22 03:08 Rtakaha

tensorflow is compiled with _GLIBCXX_USE_CXX11_ABI=1 from 2.9.0, and I believe this causes the problem. https://github.com/tensorflow/tensorflow/releases/tag/v2.9.0

Tried below with tensorflow=2.7.3, but didn't work (build failed, same error as the previous comment ).

bazel build -c opt --copt=-D_GLIBCXX_USE_CXX11_ABI=0 --config=cuda //lingvo:trainer

Rtakaha avatar Aug 10 '22 05:08 Rtakaha

I have finally fixed this by setting back _GLIBCXX_USE_CXX11_ABI to 0 in several files and compile with tensorflow==2.7.3!

Let me summary this issue for my memorandum and for people who wanna use waymo-open-dataset.

Issue summary

Currently, the master branch of this repo is expected to have tensorflow==2.9.*., which can be assumed by the commit here. Note that tensorflow==2.9.* is compiled with _GLIBCXX_USE_CXX11_ABI=1(reference).

On the other hand, there are no waymo-open-dataset prepared for tensorflow==2.9.*, while they have one prepared for tensorflow==2.6.0 as their latest package. The latest version of waymo-open-dataset is compiled with an old ABI (_GLIBCXX_USE_CXX11_ABI=0, reference), so tensorflow==2.9.* cannot load waymo-open-dataset properly.

This blog helped me very much to understand the ABI problem.

How I solved

Set tensorflow< 2.9 and fix several files that declare _GLIBCXX_USE_CXX11_ABI=1.

In Dockerfile, set tensorflow to 2.7.*. I confirmed the build succeeds with following versions.

tensorflow==2.7.3
tensorflow-datasets==4.6.0
tensorflow-estimator==2.7.0
tensorflow-hub==0.12.0
tensorflow-io-gcs-filesystem==0.26.0
tensorflow-metadata==1.9.0
tensorflow-probability==0.15.0
tensorflow-text==2.7.3

Then, following the commit, set back _GLIBCXX_USE_CXX11_ABI to 0 in all the files.

Other possible solusions

Build tensorflow==2.9.* with _GLIBCXX_USE_CXX11_ABI=0.

See here to learn how to build tensorflow. When you build it, give the flag --copt=-D_GLIBCXX_USE_CXX11_ABI=0.

You need to set back _GLIBCXX_USE_CXX11_ABI to 0 following the commit in this solution too.

Create waymo-open-dataset for tensorflow==2.9.*

I believe this is the only way to use waymo-open-dataset with tensorflow==2.9.*, but I could not find out how to do it. You need to prepare tf/workspace_tf2_9_*.bzl with proper library versions.

For those who want to use waymo-open-dataset in current master branch

I would like to close this issue, but please be aware that the trainer.py in master branch cannot be built with the given dev.dockerfile for now.

Rtakaha avatar Aug 12 '22 04:08 Rtakaha