feature-engineering-tutorials icon indicating copy to clipboard operation
feature-engineering-tutorials copied to clipboard

An improved library similar to Pandas-Profiling

Open lettergram opened this issue 2 years ago • 0 comments

Howdy!

I'm reaching out as a maintainer of the DataProfiler library.

I think it might be useful to your project so I'm reaching out!

We effectively wrote a library to improve upon the objectives of pandas-profiling with some neat added functionality:

  • Auto-Detect & Load: CSV, AVRO, Parquet, JSON, Text, URL data = Data("your_filepath_or_url.csv")
  • Profile data: calculating statistics and doing entity detection (for PII) profile = Profiler(data)
  • Merge profiles: profile3 = profile1 + profile2; enabling distributed profile generation
  • Compare profiles: profile_diff = profile1.diff(profile2)
  • Generate reports: readable_report = profile.report(report_options={"output_format": "compact"})
import json
from dataprofiler import Data, Profiler

data = Data("your_file.csv") # Auto-Detect & Load: CSV, AVRO, Parquet, JSON, Text, URL

print(data.data.head(5)) # Access data directly via a compatible Pandas DataFrame

profile = Profiler(data) # Calculate Statistics, Entity Recognition, etc

readable_report = profile.report(report_options={"output_format": "compact"})

print(json.dumps(readable_report, indent=4))

lettergram avatar Dec 07 '21 16:12 lettergram