datafusion-python icon indicating copy to clipboard operation
datafusion-python copied to clipboard

Python binding for DataFusion

Results 14 datafusion-python issues
Sort by recently updated
recently updated
newest added
trafficstars

This implements a PyArrow Dataset TableProvider that allows for using Datasets as tables in Datafusion. Fixes #10

`datafusion-python` was donated to the Apache Arrow project in April 2021 and was added to the `arrow-datafusion` repository [1]. `datafusion-python` was removed from the repository in January 2022 [2] and...

I'm trying to get started with DataFusion and would like to run some basic operations to try out the library. I'd like to read a CSV file into a DataFrame...

_Moved from https://github.com/apache/arrow-datafusion/issues/1136_ Another good set might be timestamps (datetime.date, etc) but perhaps we can add those as a separate PR _Originally posted by @alamb in https://github.com/apache/arrow-datafusion/pull/1130#pullrequestreview-781435561_

## Question Who are the maintainers of this repository and who should be the owners of the associated pypi accounts? As of now, the list of people who own the...

question

Moving from main datafusion repo. https://github.com/apache/arrow-datafusion/issues/1496

Rust by default compiles towards a very old architecture, which limit the performance of the. We should probably update this with a newer An example of Polars usage: https://github.com/pola-rs/polars/blob/master/.github/deploy_manylinux.sh#L11 There...

I would like to be able to run information schema queries from python, such as `SHOW COLUMNS from table`.

Right now it returns `List[pa.RecordBatch]`, but it might be more natural to return a `pa.Table`. For one thing, they have a better repr provided by PyArrow.

I feel odd even asking this - but is it possible to make enhancements so that `datafusion-python` can be used without `pyarrow`? `pyarrow` is fantastic and I already use it,...