pygwalker
pygwalker copied to clipboard
[Feat] Workflow for programmatically exporting a plot
A typical workflow is probably to explore the dataset using PyGWalker, create some plots and finally export the plot as an image. What would be the best way forward to recreate the plot using an exported visualization configuration and programmatically export the plot?
In my opinion, such an interaction with PyGWalker could look like the following, where I would prefer the export as a Vega configuration option.
gwalker = pyg.walk(dataframe, spec=vis_spec)
# Export as Vega lite configuration:
vega_config = gwalker.export_vega()
# Export as JPEG:
gwalker.export(path="plot.jpeg")
Do you have a feature like that in mind and what are you thoughts about such functionality?
Thank you for your suggestion, we will consider this export way. As this involves communication from the frontend to the kernel, we are currently working on developing a stable communication method that can be adapted for Jupyter Lab, Jupyter Notebook, and Jupyter Lab on various websites.
Currently, you can export your configuration options from the frontend.
Hi, are you trying to use this feature on jupyter? if so, features similar to this has been added.
You can try the pre-release version first.
pre-release 0.2.0: here
Also look forward to your feedback, Thanks♪(・ω・)ノ
Thank you very much for this feature! Looks really good to me.
A few observations:
1.)
walker.export_chart("Chart 1")
returns a Dict with various attributes. Among them there is the data URI, what I find really useful to automatically export images from the chart.
To actually export an image, I needed to use something like that:
exported = walker.export_chart("Chart 1")
import urllib
from PIL import Image
from io import BytesIO
response = urllib.request.urlopen(exported["charts"][0]['data'])
img = Image.open(BytesIO(response.read()))
display(img)
Suggestion:
Can we expose a function similar to this walker.export("Chart 1", path="plot.png")
, where the user can rely on PyGWalker to handle the image data and the export?
Additionally, I would find it really useful if an additional attribute within the Dict would be provided, which contains the Vega specification.
2.)
Even if it is not as straight forward as using a JSON file, could PyGWalker also support the previous style of providing a JSON string to the spec
attribute?
3.) I noticed that even if the JSON file already exists and I execute the notebook again, this cell fails:
walker.display_chart("Chart 1")
Error: ValueError: chart_name: Chart 1 not found, please confirm whether to save
Only if I click on save again within PyGWalker this cell works as expected, even tho the JSON file should not change.
Thank you very much for your great effort!
Thanks for feedback, good suggestions!
-
Since a "chart" may contain multiple images, so i need html and css to restore the appearance in vega, but i will try to export a single and complete chart.
-
spec
can still pass in json string, but "save" feature won't be available. -
yes,
walker
didn't initialize the image datas when it was initialized, i will improve it.
install new pre-release version: pip install pygwalker --upgrade --pre
.
walker.save_chart_to_file("chart name", "xxx.png")
to save the chart to the local file system.
walker.export_chart_png("chart name")
return chart bytes.
looking forward to your try.
Thanks for the update!
The save_chart_to_file
looks good to me.
For the export_chart_png
function I actually prefer the previous export_chart
function as the Dict might provide valuable metadata in the future. As this dict also contains the image data, can we maybe keep that?
I also stumbled across one small issue: When I have a cell in a notebook that displays the interactive plot with the pyg.walk
function with a provided JSON file and I put a cell below, that either uses save_chart_to_file
or display_chart
on that plot, I run into the same error again:
ValueError: chart_name: Chart 1 not found, please confirm whether to save
This only happens if I execute the whole notebook using the "run all" functionality provided by Jupyter.
Contrary to before this cell executes fine when I execute it again by executing the failed cell manually.
To me it seems like some parts of pyg.walk
on the frontend side of things do finish after the Python part is already done and Jupyter executes the next cell too early.
Thanks again for your amazing work 👍
method similar to export_chart
to return metadata of chart, and will consider adding it in the future.
"run all"does cause this problem, since next cell start running before the ui has finished initializing.
in jupyter, communication work (initialization work) only when no cells are executing code. "run all" will cause the cell to work all the time.
Consider adding a configuration item store_chart_metadata
, pyg.walker(df, spec="xxx", store_chart_metadata=True)
, save the chart metadata to the disk, and ensure that the next initialization loads the chart from the disk.
By the way, on which platform are you using pygwalker? local jupyter? kaggle?
Yes, I can see how it can be quite challenging to have the Python part block execution till the frontend/JS part is done as well. Would still be great to get this working as at least in my work I tend to re-execute the whole notebook regularly.
I am mostly working on a jupyter notebook within VS Code.
If you save the chart data in disk, it can solve "run all" problem.
like this:
walker = pyg.walk(df, spec="xxx.json", store_chart_data=True)
pygwalker(0.2.0) has been released.
Thanks again for your advice.