ydata-profiling
ydata-profiling copied to clipboard
Loading json-saved profiles back
Proposed feature
-
profile.load_json_as_report(fname)
:- Using prevoisuly saved report as
JSON
viajson_data = profile.to_json()
for instantanly generating reports interactively in Jupyter notebook. - This feature should avoid calculating the report again and again from the original datastream
-
JSON
for pandas-profile is a great feature for dumping into various data-blobs and NoSQL, therefore a quick visualization via panda-profile would be nice, too
- Using prevoisuly saved report as
Sorry I went carefully through docs and repository and I couldn't find a JSON
-load option including loading tables, figures, reports, and stats. If I missed it, please forget, but provide my the keywords
Thanks for taking the time to open an issue. For the saving and loading of reports you can use the report.dump("file.pp")
and report.load("file.pp")
methods. The serialization uses pickle instead of JSON.
We're currently working on improving the JSON export as well for extended interoperability with other packages, such as https://github.com/great-expectations/great_expectations (see report.to_json()
and report.to_file("file.json")
).
@sbrugman thx, that was pretty useful!
@sbrugman Can you please provide a full example to dump and load the report? I am not able to get this working to generate a report from binary file that is created by the loads method.
@Anselmoo I am unable to load history json profile into a report, could you please help me with sample code?
@Anselmoo I am unable to load history json profile into a report, could you please help me with sample code?
@sbrugman can you take a look, please? And my provide some example code?
My use case involve profiling database tables on daily basis after the load has completed and compare today vs yesterday to generate difference report. The problem I see is that the dataframe is required when reading back the profile using df.ProfileReport.load('report.pp')
or report.load('report.pp')
.
However, this would not be possible as the data in tables will change after the load. Is there an alternate approach where the report load
is not dependent on dataframe?
My use case involve profiling database tables on daily basis after the load has completed and compare today vs yesterday to generate difference report. The problem I see is that the dataframe is required when reading back the profile using
df.ProfileReport.load('report.pp')
orreport.load('report.pp')
. However, this would not be possible as the data in tables will change after the load. Is there an alternate approach where the reportload
is not dependent on dataframe?
I have exactly the same use case. Was there any solution for this? Thanks
Actually it is possible to save and load with existing dump and load methods. But cant use it for compare because
ydata_profiling/compare_reports.py in validate_reports(reports, configs) 185 is_df_available = [r.df is not None for r in reports] # type: ignore 186 if not all(is_df_available): --> 187 raise ValueError("Reports where not initialized with a DataFrame.")
Actually it is possible to save and load with existing dump and load methods. But cant use it for compare because
ydata_profiling/compare_reports.py in validate_reports(reports, configs) 185 is_df_available = [r.df is not None for r in reports] # type: ignore 186 if not all(is_df_available): --> 187 raise ValueError("Reports where not initialized with a DataFrame.")
https://github.com/ydataai/ydata-profiling/blob/fdc034603d5b5ee385471b12a5504fd59b9e8858/src/ydata_profiling/compare_reports.py#L184-L187
@Ananthbabu86, might be interesting. https://docs.github.com/en/repositories/working-with-files/using-files/getting-permanent-links-to-files#
Hello, i faced the same problem as a @Ananthbabu86, any progress here?