clickhouse-connect icon indicating copy to clipboard operation
clickhouse-connect copied to clipboard

Fix unhandled type '<class 'pyarrow.lib.Buffer'>' in send method

Open astaff opened this issue 10 months ago • 6 comments

Summary

pyarrow.RecordBatchFileWriter.getvalue() returns pyarrow.Buffer, and downstream call expects bytes. This commit fixes that for pyarrow==19.0.1

astaff avatar Mar 08 '25 19:03 astaff

CLA assistant check
All committers have signed the CLA.

CLAassistant avatar Mar 08 '25 19:03 CLAassistant

Is the error you're seeing just a Typing error? The actual code works fine with PyArrow 19.0.1 (at least on my Mac). The urllib3 library successfully POSTs the PyArrow.Buffer data without reading the stream into memory, so I believe your change could cause higher memory usage in some cases.

If it is a typing error I'd rather fix it by changing the typing hints on the affected methods.

genzgd avatar Mar 08 '25 22:03 genzgd

Thanks for your prompt reply over the weekend, @genzgd !

We're seeing a runtime error with urllib3==1.26.20 in which the pyarrow.Buffer object is passed all the way down to socket.send(), resulting in:

'Connection aborted.', OSError("unhandled type '<class 'pyarrow.lib.Buffer'>' in send method")

This error disappears when upgrading to urllib3==2.3.0.

I agree with you regarding the increased memory usage risk introduced by my proposed change. A better long-term solution would indeed be to bump the minimum supported version of urllib3.

We'll rely temporarily on this workaround until we update our dependencies to use the newer urllib3.

astaff avatar Mar 09 '25 21:03 astaff

Thanks for the in depth analysis! I'm hesitant to force an upgrade to urllib 2.x so I'm looking at some kind of check in the arrow_buffer method for backward compatibility.

genzgd avatar Mar 09 '25 21:03 genzgd

Yes, that sounds great. Just need to determine the exact version to check for. :-)

astaff avatar Mar 09 '25 22:03 astaff

So I've just had a chance to look at this and I can't reproduce the error (again on my Mac, with Python 3.13 and urllib3 1.26.20). Do you have a simple reproducible example so I can track down exactly what I need to check for?

genzgd avatar Mar 28 '25 22:03 genzgd