pyarrow topic

List pyarrow repositories

ibis

4.4k
Stars
539
Forks
Watchers

the portable Python dataframe library

vaex

8.2k
Stars
590
Forks
Watchers

Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per second 🚀

petastorm

1.8k
Stars
281
Forks
Watchers

Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, a...

pdf2dataset

17
Stars
3
Forks
Watchers

Converts a whole subdirectory with a big (or small) volume of PDF documents to a dataset (pandas DataFrame) with error tracking and choice of features

chicago-crimes

37
Stars
4
Forks
Watchers

Exploring Chicago crimes dataset with Jupyter notebooks, DuckDB, Malloy and new Panel/PyScript data and dashboard tools.

faker-cli

68
Stars
4
Forks
Watchers

Command-line interface to quickly generate fake CSV and JSON data

biobear

134
Stars
7
Forks
Watchers

Work with bioinformatic files using Arrow, Polars, and/or DuckDB