connector-x icon indicating copy to clipboard operation
connector-x copied to clipboard

Provide interface stream out arrow RecordBatch

Open wangxiaoying opened this issue 3 years ago • 4 comments

wangxiaoying avatar May 06 '22 17:05 wangxiaoying

it would be very helpful. Any chances to see this in the future?

fnicastri avatar Aug 06 '24 16:08 fnicastri

We have initialized the arrow batch iterator for rust and cpp library. Need more work in terms of testing and exposing to python library.

wangxiaoying avatar Aug 14 '24 05:08 wangxiaoying

I think it will be an awesome addition to be able to get a RecordBatchReader directly from read_sql which only materializes the record batches (sends queries to DB) when the user requests read_next_batch.

@wangxiaoying you think the testing is done to expose arrow record batch iterator to python side, is there anyway I can help with this?

chitralverma avatar May 11 '25 09:05 chitralverma

I think it will be an awesome addition to be able to get a RecordBatchReader directly from read_sql which only materializes the record batches (sends queries to DB) when the user requests read_next_batch.

@wangxiaoying you think the testing is done to expose arrow record batch iterator to python side, is there anyway I can help with this?

Yes, I think we can definitely enable the record batch reader. Please feel free to submit a PR!

wangxiaoying avatar Jun 11 '25 03:06 wangxiaoying

@wangxiaoying I checked this out today and here are my findings

  1. arrow_rb can be easily added on the python side as a return_type and can return a generator ofRecordBatches. I did this and it works as expected.
  2. After doing this there was no performance benefit because the dispatcher is eager.

in order to make the record batch path truely lazy, im thinking

  • ~The dispatcher can to have an alternate implementation of run where the operations don't happen eagerly but is backed by an iterator.~ [already available]
  • this iterator is also exposed on python side which is passed to the RecordBatchReader.
  • When this RecordBatchReader is consumed, the operations happen at that time calling the next() on the iterator.

~This seems quite complicated to me considering my limited understanding of this code base. I'll still try to give this a shot, but if you have any suggestion please let me know.~

chitralverma avatar Jun 19 '25 17:06 chitralverma

actually, scratch the above, I managed to get this working exactly as expected. will raise a PR today for your review. :D

chitralverma avatar Jun 20 '25 09:06 chitralverma