datacleaning topic

List datacleaning repositories

kaggle-with-R

6
Stars
7
Forks
Watchers

All kaggle datasets and the R codes

HyperGBM

323
Stars
45
Forks
Watchers

A full pipeline AutoML tool for tabular data

Twitter-Sentiment-Analysis

203
Stars
122
Forks
Watchers

It is a Natural Language Processing Problem where Sentiment Analysis is done by Classifying the Positive tweets from negative tweets by machine learning models for classification, text mining, text a...

great_expectations

9.5k
Stars
1.5k
Forks
69
Watchers

Always know what to expect from your data.

OpenRefine

10.5k
Stars
1.9k
Forks
Watchers

OpenRefine is a free, open source power tool for working with messy data and improving it

dataprep

1.9k
Stars
200
Forks
Watchers

Open-source low code data preparation library in python. Collect, clean and visualization your data in python with a few lines of code.

This repository contains data and code used to get and clean data from https://github.com/CSSEGISandData/COVID-19 and https://www.worldometers.info/coronavirus/

covid-19-india-data

38
Stars
81
Forks
Watchers

data and code for scrapping and cleaning data on covid-19 in India from https://www.mohfw.gov.in/ and https://www.covid19india.org/

amora-data-build-tool

46
Stars
4
Forks
Watchers

Amora Data Build Tool enables analysts and engineers to transform data on the data warehouse (BigQuery) by writing Amora Models that describe the data schema using Python's "PEP484 - Type Hints" and s...

validatedb

32
Stars
4
Forks
Watchers

Validate on a table in a DB, using dbplyr