datafusion-python
datafusion-python copied to clipboard
Should PyDataFrame.collect() return a Table?
Right now it returns List[pa.RecordBatch], but it might be more natural to return a pa.Table. For one thing, they have a better repr provided by PyArrow.
Asides from repr, do you see any other advantages?
This is to keep the signature in sync with what we have in the Rust core. Perhaps it would be better to add a new method to return a pa.Table instead.
Asides from repr, do you see any other advantages?
Mostly was just surprised coming from PyArrow, but it sounds like Rust usually just represents results as a sequence of record batches.
Perhaps it would be better to add a new method to return a pa.Table instead.
Yeah perhaps that's a better path. A to_table() method is common in PyArrow. If we eventually get the C Streaming data interface implemented in arrow-rs, we could also provide a to_reader().