pygwalker icon indicating copy to clipboard operation
pygwalker copied to clipboard

[Feat] Workflow for programmatically exporting a plot

Open Julius-Plehn opened this issue 1 year ago • 1 comments

A typical workflow is probably to explore the dataset using PyGWalker, create some plots and finally export the plot as an image. What would be the best way forward to recreate the plot using an exported visualization configuration and programmatically export the plot?

In my opinion, such an interaction with PyGWalker could look like the following, where I would prefer the export as a Vega configuration option.

gwalker = pyg.walk(dataframe, spec=vis_spec)

# Export as Vega lite configuration:
vega_config = gwalker.export_vega()

# Export as JPEG:
gwalker.export(path="plot.jpeg")

Do you have a feature like that in mind and what are you thoughts about such functionality?

Julius-Plehn avatar Jun 02 '23 07:06 Julius-Plehn

Thank you for your suggestion, we will consider this export way. As this involves communication from the frontend to the kernel, we are currently working on developing a stable communication method that can be adapted for Jupyter Lab, Jupyter Notebook, and Jupyter Lab on various websites.

Currently, you can export your configuration options from the frontend.

longxiaofei avatar Jun 06 '23 01:06 longxiaofei

Hi, are you trying to use this feature on jupyter? if so, features similar to this has been added.

You can try the pre-release version first.

pre-release 0.2.0: here

Also look forward to your feedback, Thanks♪(・ω・)ノ

longxiaofei avatar Jul 12 '23 13:07 longxiaofei

Thank you very much for this feature! Looks really good to me.

A few observations:

1.) walker.export_chart("Chart 1") returns a Dict with various attributes. Among them there is the data URI, what I find really useful to automatically export images from the chart. To actually export an image, I needed to use something like that:

exported = walker.export_chart("Chart 1")
import urllib
from PIL import Image
from io import BytesIO
response = urllib.request.urlopen(exported["charts"][0]['data'])
img = Image.open(BytesIO(response.read()))
display(img)

Suggestion: Can we expose a function similar to this walker.export("Chart 1", path="plot.png"), where the user can rely on PyGWalker to handle the image data and the export?

Additionally, I would find it really useful if an additional attribute within the Dict would be provided, which contains the Vega specification.

2.) Even if it is not as straight forward as using a JSON file, could PyGWalker also support the previous style of providing a JSON string to the spec attribute?

3.) I noticed that even if the JSON file already exists and I execute the notebook again, this cell fails:

walker.display_chart("Chart 1")

Error: ValueError: chart_name: Chart 1 not found, please confirm whether to save Only if I click on save again within PyGWalker this cell works as expected, even tho the JSON file should not change.

Thank you very much for your great effort!

Julius-Plehn avatar Jul 12 '23 18:07 Julius-Plehn

Thanks for feedback, good suggestions!

  1. Since a "chart" may contain multiple images, so i need html and css to restore the appearance in vega, but i will try to export a single and complete chart.

  2. spec can still pass in json string, but "save" feature won't be available.

  3. yes, walker didn't initialize the image datas when it was initialized, i will improve it.

longxiaofei avatar Jul 13 '23 01:07 longxiaofei

install new pre-release version: pip install pygwalker --upgrade --pre.

walker.save_chart_to_file("chart name", "xxx.png") to save the chart to the local file system. walker.export_chart_png("chart name") return chart bytes.

looking forward to your try.

longxiaofei avatar Jul 15 '23 01:07 longxiaofei

Thanks for the update! The save_chart_to_file looks good to me. For the export_chart_png function I actually prefer the previous export_chart function as the Dict might provide valuable metadata in the future. As this dict also contains the image data, can we maybe keep that?

I also stumbled across one small issue: When I have a cell in a notebook that displays the interactive plot with the pyg.walk function with a provided JSON file and I put a cell below, that either uses save_chart_to_file or display_chart on that plot, I run into the same error again: ValueError: chart_name: Chart 1 not found, please confirm whether to save This only happens if I execute the whole notebook using the "run all" functionality provided by Jupyter. Contrary to before this cell executes fine when I execute it again by executing the failed cell manually.

To me it seems like some parts of pyg.walk on the frontend side of things do finish after the Python part is already done and Jupyter executes the next cell too early.

Thanks again for your amazing work 👍

Julius-Plehn avatar Jul 17 '23 13:07 Julius-Plehn

method similar to export_chart to return metadata of chart, and will consider adding it in the future.

"run all"does cause this problem, since next cell start running before the ui has finished initializing.

in jupyter, communication work (initialization work) only when no cells are executing code. "run all" will cause the cell to work all the time.

Consider adding a configuration item store_chart_metadata, pyg.walker(df, spec="xxx", store_chart_metadata=True), save the chart metadata to the disk, and ensure that the next initialization loads the chart from the disk.

longxiaofei avatar Jul 18 '23 07:07 longxiaofei

By the way, on which platform are you using pygwalker? local jupyter? kaggle?

longxiaofei avatar Jul 18 '23 10:07 longxiaofei

Yes, I can see how it can be quite challenging to have the Python part block execution till the frontend/JS part is done as well. Would still be great to get this working as at least in my work I tend to re-execute the whole notebook regularly.

I am mostly working on a jupyter notebook within VS Code.

Julius-Plehn avatar Jul 18 '23 15:07 Julius-Plehn

If you save the chart data in disk, it can solve "run all" problem.

like this:

walker = pyg.walk(df, spec="xxx.json", store_chart_data=True)

pygwalker(0.2.0) has been released.

Thanks again for your advice.

longxiaofei avatar Jul 25 '23 06:07 longxiaofei