onnx-caffe2 icon indicating copy to clipboard operation
onnx-caffe2 copied to clipboard

undefined symbol: _ZNK6google8protobuf7Message13SpaceUsedLongEv

Open lcskrishna opened this issue 7 years ago • 36 comments

I am trying to use this tool for converting a caffe2 model to onnx model using the example given #3

I am trying to convert resnet-101 model.

Below is my error log:

Traceback (most recent call last):
  File "conversion.py", line 1, in <module>
    import onnx_caffe2.frontend as c2_onnx
  File "/home/chaitanya/.local/lib/python2.7/site-packages/onnx_caffe2/frontend.py", line 8, in <module>
    from onnx import onnx_pb2, checker
  File "/home/chaitanya/.local/lib/python2.7/site-packages/onnx/__init__.py", line 7, in <module>
    from . import checker, helper
  File "/home/chaitanya/.local/lib/python2.7/site-packages/onnx/checker.py", line 14, in <module>
    from onnx import defs
  File "/home/chaitanya/.local/lib/python2.7/site-packages/onnx/defs/__init__.py", line 6, in <module>
    import onnx.onnx_cpp2py_export as C
ImportError: /home/chaitanya/.local/lib/python2.7/site-packages/onnx/onnx_cpp2py_export.so: undefined symbol: _ZNK6google8protobuf7Message13SpaceUsedLongEv

can someone help me out with the above issue.

lcskrishna avatar Sep 16 '17 16:09 lcskrishna

Hi @lcskrishna. Did you install using the binaries (conda install -c ezyang onnx) or from source?

ezyang avatar Sep 16 '17 16:09 ezyang

I have installed using source (pip install onnx)

lcskrishna avatar Sep 16 '17 16:09 lcskrishna

Please pip uninstall onnx and then try the binary install.

ezyang avatar Sep 16 '17 16:09 ezyang

@lcskrishna Did you do pip inside a conda virtual environment?

bddppq avatar Sep 16 '17 16:09 bddppq

@bddppq No, I don't have conda virtual environment setup, I'm trying directly using pip install. Also, I have installed protobuf and protoc from github source and I have installed numpy using pip.

lcskrishna avatar Sep 16 '17 17:09 lcskrishna

@lcskrishna I see. Which OS are you using? Could you do ldd /home/chaitanya/.local/lib/python2.7/site-packages/onnx/onnx_cpp2py_export.so and readelf -d /home/chaitanya/.local/lib/python2.7/site-packages/onnx/onnx_cpp2py_export.so and paste the output here? Also echo $LD_LIBRARY_PATH.

bddppq avatar Sep 16 '17 17:09 bddppq

@bddppq

I am using Ubuntu 16.04

Here are the outputs you asked for:

%ldd /home/chaitanya/.local/lib/python2.7/site-packages/onnx/onnx_cpp2py_export.so

 linux-vdso.so.1 =>  (0x00007ffd2ad53000)
        libprotobuf.so.9 => /usr/lib/x86_64-linux-gnu/libprotobuf.so.9 (0x00007f0f3edc6000)
        libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f0f3ebb0000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f0f3e992000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f0f3e5c8000)
        libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f0f3e3ae000)
        libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f0f3e02b000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f0f3dd22000)
        /lib64/ld-linux-x86-64.so.2 (0x00005563d2b0c000)

%readelf -d /home/chaitanya/.local/lib/python2.7/site-packages/onnx/onnx_cpp2py_export.so

Dynamic section at offset 0x67bd0 contains 27 entries:
  Tag        Type                         Name/Value
 0x0000000000000001 (NEEDED)             Shared library: [libprotobuf.so.9]
 0x0000000000000001 (NEEDED)             Shared library: [libgcc_s.so.1]
 0x0000000000000001 (NEEDED)             Shared library: [libpthread.so.0]
 0x0000000000000001 (NEEDED)             Shared library: [libc.so.6]
 0x000000000000000c (INIT)               0x13a48
 0x000000000000000d (FINI)               0x4feb4
 0x0000000000000019 (INIT_ARRAY)         0x2671f0
 0x000000000000001b (INIT_ARRAYSZ)       96 (bytes)
 0x000000000000001a (FINI_ARRAY)         0x267250
 0x000000000000001c (FINI_ARRAYSZ)       8 (bytes)
 0x000000006ffffef5 (GNU_HASH)           0x1f0
 0x0000000000000005 (STRTAB)             0x5be0
 0x0000000000000006 (SYMTAB)             0x18d8
 0x000000000000000a (STRSZ)              41221 (bytes)
 0x000000000000000b (SYMENT)             24 (bytes)
 0x0000000000000003 (PLTGOT)             0x268000
 0x0000000000000002 (PLTRELSZ)           4896 (bytes)
 0x0000000000000014 (PLTREL)             RELA
 0x0000000000000017 (JMPREL)             0x12728
 0x0000000000000007 (RELA)               0x102e0
 0x0000000000000008 (RELASZ)             9288 (bytes)
 0x0000000000000009 (RELAENT)            24 (bytes)
 0x000000006ffffffe (VERNEED)            0x10280
 0x000000006fffffff (VERNEEDNUM)         2
 0x000000006ffffff0 (VERSYM)             0xfce6
 0x000000006ffffff9 (RELACOUNT)          186
 0x0000000000000000 (NULL)               0x0

% echo $LD_LIBRARY_PATH

/usr/local/lib

lcskrishna avatar Sep 16 '17 17:09 lcskrishna

@lcskrishna Hmm...everything looks normal to me. Could you also do nm -C /usr/lib/x86_64-linux-gnu/libprotobuf.so.9 | grep SpaceUsedLong?

bddppq avatar Sep 16 '17 17:09 bddppq

I am getting the following output :

nm: /usr/lib/x86_64-linux-gnu/libprotobuf.so.9: no symbols

lcskrishna avatar Sep 16 '17 18:09 lcskrishna

@lcskrishna Not sure whether it's your protobuf installation being broken. Adding "-D" flag to the nm command might help debugging. In the meanwhile, since you are using ubuntu, could you use "sudo apt-get install libprotobuf-dev protobuf-compiler" to install protobuf?

bddppq avatar Sep 16 '17 19:09 bddppq

@lcskrishna Have you been able to resolve the issue?

bddppq avatar Sep 18 '17 18:09 bddppq

I tried a fresh installation of caffe2, protobuf, onnx and onnx-caffe2 Also, I have used conda installation for onnx. Now the above error doesn't show up, however I am getting the following error while running the conversion:

WARNING:root:This caffe2 python run does not have GPU support. Will run in CPU only mode.
WARNING:root:Debug message: No module named caffe2_pybind11_state_gpu
Traceback (most recent call last):
  File "../../caffe2-conv/conversion.py", line 9, in <module>
    c2_net.ParseFromString(f.read())
google.protobuf.message.DecodeError: Error parsing message

Here is my script:

import onnx_caffe2.frontend as c2_onnx
from caffe2.proto import caffe2_pb2
import os

c2_net = caffe2_pb2.NetDef()
model_path = '/home/chaitu/work/caffe2_models/model/'
c2_model_file = os.path.join(model_path, 'resnet101_init_net.pb')
with open(c2_model_file, 'rb') as f:
    c2_net.ParseFromString(f.read())
onnx_graph = c2_onnx.caffe2_net_to_onnx_graph(c2_net)

lcskrishna avatar Sep 18 '17 20:09 lcskrishna

@lcskrishna What's the size of your pb file? I suspect it's hitting the 64mb limit. Could you try 'export PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python' in your terminal and then run your code snippet again?

bddppq avatar Sep 19 '17 04:09 bddppq

@bddppq the size of the file is around 4mb and i forgot to export. However, I have tried by using a simple CIFAR 10 network.

I took a caffemodel of CIFAR10 and using the conversion tool in caffe2 and translated into caffe2 model and tried to perform conversion as mentioned above, still i get the following error. I'm not sure what's the issue is:

WARNING:root:This caffe2 python run does not have GPU support. Will run in CPU only mode.
WARNING:root:Debug message: No module named caffe2_pybind11_state_gpu
Unrecognized attribute: legacy_pad
Traceback (most recent call last):
  File "conversion.py", line 10, in <module>
    onnx_graph = c2_onnx.caffe2_net_to_onnx_graph(c2_net)
  File "/home/chaitu/.local/lib/python2.7/site-packages/onnx_caffe2/frontend.py", line 254, in caffe2_net_to_onnx_graph
    caffe2_op_to_node_def(op, name_map) for op in net_def.op)
  File "/home/chaitu/.local/lib/python2.7/site-packages/onnx_caffe2/frontend.py", line 254, in <genexpr>
    caffe2_op_to_node_def(op, name_map) for op in net_def.op)
  File "/home/chaitu/.local/lib/python2.7/site-packages/onnx_caffe2/frontend.py", line 205, in caffe2_op_to_node_def
    checker.check_node(node_def)
  File "/home/chaitu/.local/lib/python2.7/site-packages/onnx/checker.py", line 38, in check_node
    'NodeProto of type {} did not pass defs schema check.'.format(str(node.op_type)))
ValueError: NodeProto of type MaxPool did not pass defs schema check.

lcskrishna avatar Sep 25 '17 18:09 lcskrishna

@lcskrishna Please add option --remove_legacy_pad when you do the translation from caffe model to caffe2 model.

jerryzh168 avatar Sep 25 '17 18:09 jerryzh168

@jerryzh168 I get the following error while I try to translate using --remove_legacy_pad

Traceback (most recent call last): File "caffe_translator.py", line 853, in input_dims=args.input_dims File "caffe_translator.py", line 259, in TranslateModel return TranslatorRegistry.TranslateModel(*args, **kwargs) File "caffe_translator.py", line 254, in TranslateModel net = _RemoveLegacyPad(net, net_params, input_dims) File "caffe_translator.py", line 124, in _RemoveLegacyPad dim_map = _GetLegacyDims(net, net_params, dummy_input, legacy_pad_ops) File "caffe_translator.py", line 55, in _GetLegacyDims ws.create_blob(param.name)
AttributeError: 'caffe2.python.caffe2_pybind11_state.Blob' object has no attribute 'feed_blob'

lcskrishna avatar Sep 25 '17 19:09 lcskrishna

@lcskrishna could you post your caffe1 model? I'll try to modify caffe_translator to make sure it works with your model.

jerryzh168 avatar Sep 25 '17 20:09 jerryzh168

@jerryzh168 Please find the trained caffemodel

lcskrishna avatar Sep 25 '17 20:09 lcskrishna

Can you post the deploy.prototxt as well? Thanks

jerryzh168 avatar Sep 25 '17 20:09 jerryzh168

Here is my prototxt file used.

lcskrishna avatar Sep 25 '17 21:09 lcskrishna

@lcskrishna did you update your build to the most recent caffe2? I can translate your model actually. Since a more recent update don't remove legacy pad by default.

As a side note, there was an problem in _GetLegacyDims(should use feed rather than feed_blob), and it will be fixed after my new diff lands.

jerryzh168 avatar Oct 03 '17 19:10 jerryzh168

@jerryzh168 I tried it again and im still getting the same issue. Can you post the command on how you were executing the translator.

Thanks.

lcskrishna avatar Oct 03 '17 21:10 lcskrishna

I see, since you need to use remove_legacy_pad, that code will be called. Please wait until my diff landed. Also you should probably provide input_dim by add "--input_dims" option after that diff is landed.

jerryzh168 avatar Oct 03 '17 21:10 jerryzh168

@lcskrishna The diff is landed, please update your c2 and try again.

jerryzh168 avatar Oct 04 '17 00:10 jerryzh168

@jerryzh168 I tried the following after updating caffe2 and I get the following error.

Command:

python -m caffe2.python.caffe_translator ../caffe_models/cifar.prototxt ../caffe_models/cifar10_quick_iter_4000.caffemodel --remove_legacy_pad --input_dims 1 3 32 32

Error

W1003 22:11:36.411902  3167 workspace.cc:157] Blob label not in the workspace.
Traceback (most recent call last):
  File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/usr/local/caffe2/python/caffe_translator.py", line 928, in <module>
    input_dims=args.input_dims
  File "/usr/local/caffe2/python/caffe_translator.py", line 299, in TranslateModel
    return TranslatorRegistry.TranslateModel(*args, **kwargs)
  File "/usr/local/caffe2/python/caffe_translator.py", line 294, in TranslateModel
    net = _RemoveLegacyPad(net, net_params, input_dims)
  File "/usr/local/caffe2/python/caffe_translator.py", line 139, in _RemoveLegacyPad
    dim_map = _GetLegacyDims(net, net_params, dummy_input, legacy_pad_ops)
  File "/usr/local/caffe2/python/caffe_translator.py", line 77, in _GetLegacyDims
    ws._run_operator(op_def.SerializeToString())
RuntimeError: [enforce fail at operator.cc:52] blob != nullptr. op Accuracy: Encountered a non-existing input blob: label 

lcskrishna avatar Oct 04 '17 02:10 lcskrishna

@lcskrishna is this a train net? Probably you should use "deploy_net" instead.

jerryzh168 avatar Oct 04 '17 02:10 jerryzh168

Hi, I encountered a similar problem when I try to import onnx using python on Ubuntu 14.04.5 LTS (GNU/Linux 3.13.0-135-generic x86_64). I installed protocol buffer from https://github.com/google/protobuf/releases/download/v3.5.1/protobuf-all-3.5.1.zip (version 3.5.1) and also installed onnx using pip (not conda). Below is the error message and some output of my debugging.

$ python -c 'import onnx'

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/onnx/__init__.py", line 10, in <module>
    import onnx.helper  # noqa
  File "/usr/local/lib/python2.7/dist-packages/onnx/helper.py", line 15, in <module>
    import onnx.defs as defs
  File "/usr/local/lib/python2.7/dist-packages/onnx/defs/__init__.py", line 6, in <module>
    import onnx.onnx_cpp2py_export.defs as C
ImportError: /usr/local/lib/python2.7/dist-packages/onnx/onnx_cpp2py_export.so: undefined symbol: _ZNK6google8protobuf7Message13SpaceUsedLongEv

$ ldd /usr/local/lib/python2.7/dist-packages/onnx/onnx_cpp2py_export.so

	linux-vdso.so.1 =>  (0x00007ffc60dc2000)
	libprotobuf.so.8 => /usr/lib/x86_64-linux-gnu/libprotobuf.so.8 (0x00007fc1fba23000)
	libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fc1fb71f000)
	libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fc1fb419000)
	libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fc1fb203000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fc1fae3a000)
	libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fc1fac1c000)
	libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007fc1faa03000)
	/lib64/ld-linux-x86-64.so.2 (0x00007fc1fbff8000)

$ readelf -d /usr/local/lib/python2.7/dist-packages/onnx/onnx_cpp2py_export.so

Dynamic section at offset 0xd0a30 contains 28 entries:
  Tag        Type                         Name/Value
 0x0000000000000001 (NEEDED)             Shared library: [libprotobuf.so.8]
 0x0000000000000001 (NEEDED)             Shared library: [libstdc++.so.6]
 0x0000000000000001 (NEEDED)             Shared library: [libm.so.6]
 0x0000000000000001 (NEEDED)             Shared library: [libgcc_s.so.1]
 0x0000000000000001 (NEEDED)             Shared library: [libc.so.6]
 0x000000000000000c (INIT)               0x25948
 0x000000000000000d (FINI)               0xa2538
 0x0000000000000019 (INIT_ARRAY)         0x2cf500
 0x000000000000001b (INIT_ARRAYSZ)       144 (bytes)
 0x000000000000001a (FINI_ARRAY)         0x2cf590
 0x000000000000001c (FINI_ARRAYSZ)       8 (bytes)
 0x000000006ffffef5 (GNU_HASH)           0x1f0
 0x0000000000000005 (STRTAB)             0xaf80
 0x0000000000000006 (SYMTAB)             0x27a8
 0x000000000000000a (STRSZ)              82207 (bytes)
 0x000000000000000b (SYMENT)             24 (bytes)
 0x0000000000000003 (PLTGOT)             0x2d1000
 0x0000000000000002 (PLTRELSZ)           5904 (bytes)
 0x0000000000000014 (PLTREL)             RELA
 0x0000000000000017 (JMPREL)             0x24238
 0x0000000000000007 (RELA)               0x1fd08
 0x0000000000000008 (RELASZ)             17712 (bytes)
 0x0000000000000009 (RELAENT)            24 (bytes)
 0x000000006ffffffe (VERNEED)            0x1fbf8
 0x000000006fffffff (VERNEEDNUM)         4
 0x000000006ffffff0 (VERSYM)             0x1f0a0
 0x000000006ffffff9 (RELACOUNT)          370
 0x0000000000000000 (NULL)               0x0

$ echo $LD_LIBRARY_PATH /usr/local/lib

$ nm -C /usr/lib/x86_64-linux-gnu/libprotobuf.so.8 | grep SpaceUsedLong nm: /usr/lib/x86_64-linux-gnu/libprotobuf.so.8: no symbols

$ nm -C -D /usr/lib/x86_64-linux-gnu/libprotobuf.so.8 | grep SpaceUsedLong

I did "sudo apt-get install libprotobuf-dev protobuf-compiler" but found the protoc version is too low (2.5.0-9ubuntu1) for onnx (as suggested by this issue). So I manually installed a newer version of protoc (version 3.5.1). I appreciate that if anyone can give some hints on what is wrong.

hongkedavid avatar Dec 26 '17 19:12 hongkedavid

This is still an issue. I just build caffe (which is now a part of pytorch) and I am getting the same error: usr/local/lib/python2.7/dist-packages/onnx/onnx_cpp2py_export.so: undefined symbol: _ZNK6google8protobuf7Message13SpaceUsedLongEv

One thing I found is that before (when everything was still working), protoc --version would give me libprotoc 2.6.1. Now after compiling pytorch protoc --version gives me libprotoc 3.5.0. Could it be that there is an issue with conflicting protobuf libraries?

michaelschwier avatar Apr 24 '18 23:04 michaelschwier

yeah protobuf version need to match, I think we are using 2.6, cc @bddppq

jerryzh168 avatar Apr 24 '18 23:04 jerryzh168

How to solve this then, if caffe is using a different version? I tried building caffe with BUILD_CUSTOM_PROTOBUF=OFF, forcing it to use the 2.6.1 protobuf that was installed already - causes it to fail on runtime, when running inference with an ONNX model.

michaelschwier avatar Apr 25 '18 00:04 michaelschwier