tabml
tabml copied to clipboard
Data loader replacement
Howdy!
I'm reaching out as a maintainer of the DataProfiler library.
I think it might be useful to your project so I'm reaching out! Would love to collaborate and see how we can help tabml.
We effectively wrote a library to improve upon the objectives of pandas-profiling with some neat added functionality:
- Auto-Detect & Load: CSV, AVRO, Parquet, JSON, Text, URL data = Data("your_filepath_or_url.csv")
- Profile data: calculating statistics and doing entity detection (for PII) profile = Profiler(data)
- Merge profiles: profile3 = profile1 + profile2; enabling distributed profile generation
- Compare profiles: profile_diff = profile1.diff(profile2)
- Generate reports: readable_report = profile.report(report_options={"output_format": "compact"})
import json
from dataprofiler import Data, Profiler
data = Data("your_file.csv") # Auto-Detect & Load: CSV, AVRO, Parquet, JSON, Text, URL
print(data.data.head(5)) # Access data directly via a compatible Pandas DataFrame
profile = Profiler(data) # Calculate Statistics, Entity Recognition, etc
readable_report = profile.report(report_options={"output_format": "compact"})
print(json.dumps(readable_report, indent=4))