Fix unhandled type '<class 'pyarrow.lib.Buffer'>' in send method
Summary
pyarrow.RecordBatchFileWriter.getvalue() returns pyarrow.Buffer, and downstream call expects bytes. This commit fixes that for pyarrow==19.0.1
Is the error you're seeing just a Typing error? The actual code works fine with PyArrow 19.0.1 (at least on my Mac). The urllib3 library successfully POSTs the PyArrow.Buffer data without reading the stream into memory, so I believe your change could cause higher memory usage in some cases.
If it is a typing error I'd rather fix it by changing the typing hints on the affected methods.
Thanks for your prompt reply over the weekend, @genzgd !
We're seeing a runtime error with urllib3==1.26.20 in which the pyarrow.Buffer object is passed all the way down to socket.send(), resulting in:
'Connection aborted.', OSError("unhandled type '<class 'pyarrow.lib.Buffer'>' in send method")
This error disappears when upgrading to urllib3==2.3.0.
I agree with you regarding the increased memory usage risk introduced by my proposed change. A better long-term solution would indeed be to bump the minimum supported version of urllib3.
We'll rely temporarily on this workaround until we update our dependencies to use the newer urllib3.
Thanks for the in depth analysis! I'm hesitant to force an upgrade to urllib 2.x so I'm looking at some kind of check in the arrow_buffer method for backward compatibility.
Yes, that sounds great. Just need to determine the exact version to check for. :-)
So I've just had a chance to look at this and I can't reproduce the error (again on my Mac, with Python 3.13 and urllib3 1.26.20). Do you have a simple reproducible example so I can track down exactly what I need to check for?