feature-engineering-tutorials
feature-engineering-tutorials copied to clipboard
An improved library similar to Pandas-Profiling
Howdy!
I'm reaching out as a maintainer of the DataProfiler library.
I think it might be useful to your project so I'm reaching out!
We effectively wrote a library to improve upon the objectives of pandas-profiling
with some neat added functionality:
- Auto-Detect & Load: CSV, AVRO, Parquet, JSON, Text, URL
data = Data("your_filepath_or_url.csv")
- Profile data: calculating statistics and doing entity detection (for PII)
profile = Profiler(data)
- Merge profiles:
profile3 = profile1 + profile2
; enabling distributed profile generation - Compare profiles:
profile_diff = profile1.diff(profile2)
- Generate reports:
readable_report = profile.report(report_options={"output_format": "compact"})
import json
from dataprofiler import Data, Profiler
data = Data("your_file.csv") # Auto-Detect & Load: CSV, AVRO, Parquet, JSON, Text, URL
print(data.data.head(5)) # Access data directly via a compatible Pandas DataFrame
profile = Profiler(data) # Calculate Statistics, Entity Recognition, etc
readable_report = profile.report(report_options={"output_format": "compact"})
print(json.dumps(readable_report, indent=4))