pyorc icon indicating copy to clipboard operation
pyorc copied to clipboard

protocol-buffer collides with another libraries

Open hellysmile opened this issue 5 years ago • 9 comments

Hey @noirello, thanks for the awesome lib

I have found an issue, which is really hard to debug:

import aioprometheus  # import protobuf under the hood
import pyorc

with open("./new_data.orc", "wb") as data:
    with pyorc.Writer(data, "struct<col0:int,col1:string>") as writer:
        writer.write((1, "ORC from Python"))

will fail in two different ways randomly

[libprotobuf FATAL /Users/runner/runners/2.163.1/work/1/s/deps/orc-1.6.2/build/protobuf_ep-prefix/src/protobuf_ep/src/google/protobuf/message_lite.cc:71] CHECK failed: (bytes_produced_by_serialization) == (byte_size_before_serialization): Byte size calculation and serialization were inconsistent.  This may indicate a bug in protocol buffers or it may be caused by concurrent modification of orc.proto.Footer.
Traceback (most recent call last):
  File "crash.py", line 6, in <module>
    writer.write((1, "ORC from Python"))
  File "XXXX/env/lib/python3.7/site-packages/pyorc/writer.py", line 66, in __exit__
    super().close()
RuntimeError: CHECK failed: (bytes_produced_by_serialization) == (byte_size_before_serialization): Byte size calculation and serialization were inconsistent.  This may indicate a bug in protocol buffers or it may be caused by concurrent modification of orc.proto.Footer.
Segmentation fault: 11
python(50048,0x1094b35c0) malloc: *** error for object 0x7f834747a0f8: pointer being freed was not allocated
python(50048,0x1094b35c0) malloc: *** set a breakpoint in malloc_error_break to debug
Abort trap: 6

Is there any chance that You will be able to take a look on it?

hellysmile avatar May 20 '20 00:05 hellysmile

I looked around a little and it looks like it's a common problem with Python binary distributions (and libprotobuf in general) to randomly collapse when the same library is loaded more than once.

The best advice that I've found in these cases is to compile both modules from scratch making sure that they use the same version of the shared protobuf library.

That'd be the first thing I would try, but it might be unfeasible in this situation.

noirello avatar May 22 '20 22:05 noirello

Thanks a lot for quick answer, as for now going to use optional libprotobuf in https://github.com/claws/aioprometheus/pull/42 , which solves my specific case, but not problem in general

hellysmile avatar May 22 '20 23:05 hellysmile

i met the same case,do you solves the problem?

shaoshuaig avatar Apr 22 '21 14:04 shaoshuaig

@noirello could you list the versions of libprotobuf used to build the pypi wheels? Following the chain in setup.py suggests pyorc 0.4.0 -> ORC 1.6.6 -> protobuf 3.5.1, but protobuf 3.5.1 is four years old at this point and seems unlikely to have been used to build the wheels.

skearnes avatar Oct 20 '21 15:10 skearnes

@skearnes that looks correct. I use the same version that the ORC lib uses. They're still using 3.5.1 for 1.7.0 as well.

noirello avatar Oct 20 '21 18:10 noirello

I just discovered that pyarrow has an (undocumented) ORC writer that uses Cython and doesn't seem to have the same conflicts: https://github.com/apache/arrow/blob/master/python/pyarrow/orc.py

skearnes avatar Oct 22 '21 15:10 skearnes

Hi @noirello I have raised https://github.com/apache/orc/issues/1425 to see if we can get some consistency in the version of protobuf used in Apache orc & the version used in Apache arrow. However, would you be open to patching

deps/orc-1.8.1/cmake_modules/ThirdpartyToolchain.cmake such that the version set it 21.3 as part of the build process ?

--- deps/orc-1.8.1/cmake_modules/ThirdpartyToolchain.cmake.orig	2023-03-01 01:54:15
+++ deps/orc-1.8.1/cmake_modules/ThirdpartyToolchain.cmake	2023-03-01 01:53:50
@@ -14,7 +14,7 @@
 set(SNAPPY_VERSION "1.1.7")
 set(ZLIB_VERSION "1.2.11")
 set(GTEST_VERSION "1.8.0")
-set(PROTOBUF_VERSION "3.5.1")
+set(PROTOBUF_VERSION "21.3")
 set(ZSTD_VERSION "1.5.2")
 
 option(ORC_PREFER_STATIC_PROTOBUF "Prefer static protobuf library, if available" ON)

dbaxa avatar Mar 01 '23 01:03 dbaxa

TBH, I'm kind of lost about this issue. I'm not sure how Apache Arrow solved it to avoid collision with other protobuf using modules. I'll try to look into it a little bit more.

noirello avatar Mar 06 '23 18:03 noirello