data-curation topic

List data-curation repositories

cleanlab

9.3k
Stars
722
Forks
Watchers

The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.

fastdup

1.4k
Stars
74
Forks
Watchers

fastdup is a powerful free tool designed to rapidly extract valuable insights from your image & video datasets. Assisting you to increase your dataset images & labels quality and reduce your data oper...

metamapper

76
Stars
6
Forks
Watchers

Metamapper is a data discovery and documentation platform for improving how teams understand and interact with their data.

data-as-a-science

42
Stars
9
Forks
Watchers

Lesson guide and textbook for "Data as a Science" course.

xtreme1

848
Stars
139
Forks
Watchers

Xtreme1 is an all-in-one data labeling and annotation platform for multimodal data training and supports 3D LiDAR point cloud, image, and LLM.

spotlight

1.0k
Stars
83
Forks
Watchers

Interactively explore unstructured datasets from your dataframe.

awesome-open-data-centric-ai

680
Stars
36
Forks
Watchers

Curated list of open source tooling for data-centric AI on unstructured data.

data-centric-AI

995
Stars
66
Forks
Watchers

A curated, but incomplete, list of data-centric AI resources.

sliceguard

57
Stars
1
Forks
Watchers

A library for detecting problematic data segments in structured and unstructured data with few lines of code.