data-centric topic
ludwig
Low-code framework for building custom LLMs, neural networks, and other AI models
deeplake
Database for AI. Store Vectors, Images, Texts, Videos, etc. Use with LLMs/LangChain. Store, query, version, & visualize any AI data. Stream data in real-time to PyTorch/TensorFlow. https://activeloop....
DataCLUE
DataCLUE: 数据为中心的NLP基准和工具包
mr-Observer
An observer is a wrapper over JSON data, that provides an interface to know when data is changed, with a focus on performance and memory efficiency.
lance
Modern columnar data format for ML and LLMs implemented in Rust. Convert from parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckD...
pypely
From local functions to cloud deployed pipelines
Data-Centric-AI-Competition
Codes for a Top 5% finish in the Data-Centric AI Competition organized by Andrew Ng and DeepLearning.AI
data-centric-AI
A curated, but incomplete, list of data-centric AI resources.
encord-active
The toolkit to test, validate, and evaluate your models and surface, curate, and prioritize the most valuable data for labeling.
infoVerse
Jaehyung Kim et al's ACL 2023 paper on "infoVerse: A Universal Framework for Dataset Characterization with Multidimensional Meta-information"