ydata-profiling icon indicating copy to clipboard operation
ydata-profiling copied to clipboard

Loading json-saved profiles back

Open Anselmoo opened this issue 4 years ago • 10 comments

Proposed feature

  • profile.load_json_as_report(fname):
    • Using prevoisuly saved report as JSON via json_data = profile.to_json() for instantanly generating reports interactively in Jupyter notebook.
    • This feature should avoid calculating the report again and again from the original datastream
    • JSON for pandas-profile is a great feature for dumping into various data-blobs and NoSQL, therefore a quick visualization via panda-profile would be nice, too

Sorry I went carefully through docs and repository and I couldn't find a JSON-load option including loading tables, figures, reports, and stats. If I missed it, please forget, but provide my the keywords

Anselmoo avatar Jun 29 '20 07:06 Anselmoo

Thanks for taking the time to open an issue. For the saving and loading of reports you can use the report.dump("file.pp") and report.load("file.pp") methods. The serialization uses pickle instead of JSON.

We're currently working on improving the JSON export as well for extended interoperability with other packages, such as https://github.com/great-expectations/great_expectations (see report.to_json() and report.to_file("file.json")).

sbrugman avatar Jun 30 '20 16:06 sbrugman

@sbrugman thx, that was pretty useful!

Anselmoo avatar Jun 30 '20 20:06 Anselmoo

@sbrugman Can you please provide a full example to dump and load the report? I am not able to get this working to generate a report from binary file that is created by the loads method.

Tuxedo94 avatar Aug 31 '22 10:08 Tuxedo94

@Anselmoo I am unable to load history json profile into a report, could you please help me with sample code?

capnomad avatar May 30 '23 19:05 capnomad

@Anselmoo I am unable to load history json profile into a report, could you please help me with sample code?

@sbrugman can you take a look, please? And my provide some example code?

Anselmoo avatar Jun 02 '23 05:06 Anselmoo

My use case involve profiling database tables on daily basis after the load has completed and compare today vs yesterday to generate difference report. The problem I see is that the dataframe is required when reading back the profile using df.ProfileReport.load('report.pp') or report.load('report.pp'). However, this would not be possible as the data in tables will change after the load. Is there an alternate approach where the report load is not dependent on dataframe?

capnomad avatar Jun 05 '23 06:06 capnomad

My use case involve profiling database tables on daily basis after the load has completed and compare today vs yesterday to generate difference report. The problem I see is that the dataframe is required when reading back the profile using df.ProfileReport.load('report.pp') or report.load('report.pp'). However, this would not be possible as the data in tables will change after the load. Is there an alternate approach where the report load is not dependent on dataframe?

I have exactly the same use case. Was there any solution for this? Thanks

Ananthbabu86 avatar Sep 14 '23 07:09 Ananthbabu86

Actually it is possible to save and load with existing dump and load methods. But cant use it for compare because

ydata_profiling/compare_reports.py in validate_reports(reports, configs) 185 is_df_available = [r.df is not None for r in reports] # type: ignore 186 if not all(is_df_available): --> 187 raise ValueError("Reports where not initialized with a DataFrame.")

Ananthbabu86 avatar Sep 21 '23 03:09 Ananthbabu86

Actually it is possible to save and load with existing dump and load methods. But cant use it for compare because

ydata_profiling/compare_reports.py in validate_reports(reports, configs) 185 is_df_available = [r.df is not None for r in reports] # type: ignore 186 if not all(is_df_available): --> 187 raise ValueError("Reports where not initialized with a DataFrame.")

https://github.com/ydataai/ydata-profiling/blob/fdc034603d5b5ee385471b12a5504fd59b9e8858/src/ydata_profiling/compare_reports.py#L184-L187

@Ananthbabu86, might be interesting. https://docs.github.com/en/repositories/working-with-files/using-files/getting-permanent-links-to-files#

Anselmoo avatar Sep 21 '23 04:09 Anselmoo

Hello, i faced the same problem as a @Ananthbabu86, any progress here?

bvolodarskiy avatar Oct 16 '23 07:10 bvolodarskiy