databricks-sql-python
databricks-sql-python copied to clipboard
Idea: arrow_record_batches cursor method
When you call fetchmany_arrow(batchsize) and specify a batch size, you get a table that has multiple record batches.
In my experience, the record batches are much smaller than the batch size I specify. I think the SQL connector has to do record keeping to align the batch size I give and the record batches it gets from the server (IIUC).
When I call fetchmany_arrow, I end up with nested loops. The outer loops loops over fetchmany_arrow calls and the inner loops over the batches returned.
I suspect it would be less bother for everyone if there was an API (e.g. arrow_record_batches()) that returned a record-batch iterator.