data-centric topic

List data-centric repositories

ludwig

10.9k
Stars
1.2k
Forks
Watchers

Low-code framework for building custom LLMs, neural networks, and other AI models

deeplake

7.8k
Stars
599
Forks
Watchers

Database for AI. Store Vectors, Images, Texts, Videos, etc. Use with LLMs/LangChain. Store, query, version, & visualize any AI data. Stream data in real-time to PyTorch/TensorFlow. https://activeloop....

DataCLUE

145
Stars
17
Forks
Watchers

DataCLUE: 数据为中心的NLP基准和工具包

mr-Observer

23
Stars
1
Forks
Watchers

An observer is a wrapper over JSON data, that provides an interface to know when data is changed, with a focus on performance and memory efficiency.

lance

3.4k
Stars
175
Forks
30
Watchers

Modern columnar data format for ML and LLMs implemented in Rust. Convert from parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckD...

pypely

16
Stars
0
Forks
Watchers

From local functions to cloud deployed pipelines

Data-Centric-AI-Competition

20
Stars
3
Forks
Watchers

Codes for a Top 5% finish in the Data-Centric AI Competition organized by Andrew Ng and DeepLearning.AI

data-centric-AI

995
Stars
66
Forks
Watchers

A curated, but incomplete, list of data-centric AI resources.

encord-active

423
Stars
23
Forks
Watchers

The toolkit to test, validate, and evaluate your models and surface, curate, and prioritize the most valuable data for labeling.

infoVerse

16
Stars
1
Forks
Watchers

Jaehyung Kim et al's ACL 2023 paper on "infoVerse: A Universal Framework for Dataset Characterization with Multidimensional Meta-information"