databricks-sql-python icon indicating copy to clipboard operation
databricks-sql-python copied to clipboard

Idea: arrow_record_batches cursor method

Open unj1m opened this issue 1 year ago • 0 comments

When you call fetchmany_arrow(batchsize) and specify a batch size, you get a table that has multiple record batches.

In my experience, the record batches are much smaller than the batch size I specify. I think the SQL connector has to do record keeping to align the batch size I give and the record batches it gets from the server (IIUC).

When I call fetchmany_arrow, I end up with nested loops. The outer loops loops over fetchmany_arrow calls and the inner loops over the batches returned.

I suspect it would be less bother for everyone if there was an API (e.g. arrow_record_batches()) that returned a record-batch iterator.

unj1m avatar Mar 20 '24 15:03 unj1m