data-processing topic

List data-processing repositories

texar

2.4k
Stars
371
Forks
Watchers

Toolkit for Machine Learning, Natural Language Processing, and Text Generation, in TensorFlow. This is part of the CASL project: http://casl-project.ai/

miller

8.6k
Stars
202
Forks
Watchers

Miller is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON

cotk

128
Stars
27
Forks
Watchers

Conversational Toolkit. An Open-Source Toolkit for Fast Development and Fair Evaluation of Text Generation

pandera

3.1k
Stars
282
Forks
Watchers

A light-weight, flexible, and expressive statistical data testing library

DALI

4.9k
Stars
606
Forks
Watchers

A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.

Skytrax-Data-Warehouse

132
Stars
26
Forks
Watchers

A full data warehouse infrastructure with ETL pipelines running inside docker on Apache Airflow for data orchestration, AWS Redshift for cloud data warehouse and Metabase to serve the needs of data vi...

deeplake

8.1k
Stars
615
Forks
Watchers

Database for AI. Store Vectors, Images, Texts, Videos, etc. Use with LLMs/LangChain. Store, query, version, & visualize any AI data. Stream data in real-time to PyTorch/TensorFlow. https://activeloop....

Bash-Oneliner

9.9k
Stars
604
Forks
107
Watchers

A collection of handy Bash One-Liners and terminal tricks for data processing and Linux system maintenance.

amadeus

470
Stars
26
Forks
Watchers

Harmonious distributed data analysis in Rust.

broadway

2.3k
Stars
153
Forks
Watchers

Concurrent and multi-stage data ingestion and data processing with Elixir