ydata-profiling Initialize the Profiling UI with an "external" json

Missing functionality

I'd like to be able to use the pandas profiling UI on a pre-existing profiling output that I have in json format. This avoids having to "re-run" profiling. There is the functionality to have the profiling report as HTML, and save that. However, if the profiling generation and the profiling visualization were further decoupled, it would allow me to pass in a json that I may have generated (or refreshed) by other means, maybe outside pandas_profiling. It's clear that in this case, I'm responsible for providing a proper json to have the visualization work properly...

Proposed feature

By being able to set the "cache" with an existing json of a predefined schema.

Alternatives considered

No response

Additional context

No response

Jan 25 '23 19:01 alexlang74

Can I work on this one? I am pretty new to this

Jan 29 '23 20:01 jalajk24

@alexlang74 can you describe a bit more your scenario with maybe a minimal example in terms of interface?

Do you want to run the profiling without generating any output, then serializing the ProfileReport object in JSON to be able to deserializing after to a ProfileReport?

(btw, we might have worked together briefly when I was at IBM Krakow :D)

Jan 30 '23 14:01 aquemy

Hi @aquemy , nice to hear from a former colleague.-)

I have the following in mind:

profile = ProfileReport(myDf)
jsonRes = profile.to_json()
...
newProfile = ProfileReport.from_json(jsonRes)
newProfile.to_notebook_iframe()

Jan 30 '23 20:01 alexlang74

You wouldn't really do this within the same Python file / Notebook. It's as you said: One could serialize the json output, and pull it into a new Profile Report. There, one has then the flexibility to render it differently (as iFrame, as html,...). One could even use it to compare different versions of the data set over time, by keeping the json around, and then using the recently introduced comparison capabilities...

Jan 30 '23 20:01 alexlang74

@alexlang74 Thank you for the example.

We have a workaround for now if pickle is acceptable.

Serialization:

profile = ProfileReport(df,)
profile.to_file('report.html')  # Trigger the computation / alternative you can use profile.to_json() for no file output
profile.dump('my_report') # Serialize in pickle to my_report.pp

Deserialization:

loaded_profile = ProfileReport().load('my_report.pp')  # notice that you have to instantiate an empty instance of ProfileReport

loaded_profile will contain exactly the same information as the original object.

If you try to compare with the deserialized version, it will raise ValueError: Reports where not initialized with a DataFrame. because comparing requires at least the schema (because we compare only the columns that are present in both datasets).

Another workaround for that would be to at least specify the columns:

loaded_profile.df = df.head(1)  # or empty but with the columns + proper dtypes

I hope it helps!

We surely should decouple the report computation from the report generation and allow for proper serialization.

Jan 31 '23 08:01 aquemy

Thank you for the prompt response, I'll try that out!

We surely should decouple the report computation from the report generation and allow for proper serialization.

Glad that you agree with my goal. This could also help in having other tools "contribute" to the computation, and have ydata-profiling as the UI experience

Jan 31 '23 09:01 alexlang74

Can I work on this one? I am pretty new to this

Hi @jalajk24 ,

it is great that you want to contribute to the package. We have the roadmap open here, feel free to pick one that is not already taken :)

Let me know if you need any support!

Feb 01 '23 05:02 fabclmnt

This worked for me.

Can we compare 2 loaded profiles? I am getting below error: ValueError: Reports where not initialized with a DataFrame.

Jun 07 '23 19:06 capnomad

ydata-profiling ydata-profiling copied to clipboard

Initialize the Profiling UI with an "external" json

Missing functionality

Proposed feature

Alternatives considered

Additional context

ydata-profiling
ydata-profiling copied to clipboard