msgpack-python
msgpack-python copied to clipboard
The Unpacker fails to retrieve and unpack all the data while streaming with big data.
The Unpacker fails to retrieve and unpack all the data while streaming with a big data (e.g. 10GiB).
td-client-python
uses msgpack-python
internally to unpack the receiving data while streaming.
https://github.com/treasure-data/td-client-python/blob/1.2.1/tdclient/job_api.py#L220-L244
When the size of this file is 10GiB or above, I occasionally face the problem that the Unpacker fails to retrieve and unpack all the data while streaming, which result in premature termination without raising an error.
As a workaround, I rewrote the code as follows to first receive all the data, save it to a file, and unpack it from there, which seems to have solved the problem. Thus, I suspect this is a bug in Unpacker's handling of streaming input.
with open("temp.mpack", "wb") as output_file:
for chunk in res.stream(1024*1024*1024):
if chunk:
output_file.write(chunk)
with open("temp.mpack", "rb") as input_file:
unpacker = msgpack.Unpacker(input_file, raw=False)
for row in unpacker:
yield row
Unpacker can handle the file means Unpacker can handle >10GiB data. Without reproduceer, I can not fix your issue.
Maybe, res
object in your code has some file-unlike behavior. (I don't know what is self.get()
and what is res
in your code).
I recommend to use Unpacker.feed()
method. You can be freed from "file-like" edge cases.
https://github.com/msgpack/msgpack-python/blob/140864249fd0f67dffaeceeb168ffe9cdf6f1964/msgpack/_unpacker.pyx#L291-L300
Thank you for your quick response! I'll try Unpacker.feed()
and see if it can fix the problem.