VegaLite.jl
VegaLite.jl copied to clipboard
Using indent in printrepr leads to unnecessary large plots
For plots that have a lot of data, using indents in printrepr
lead to increase in plot size up to 4 times on my tests. One example:
t_plt = DataFrame(round.(rand(50000, 2), sigdigits=3)) |> @vlplot(x=:x1, y=:x2, mark=:point);
VegaLite.savespec("large_plot.json", t_plt; include_data=true)
stat("large_plot.json").size / 1024 / 1024
1.1441984176635742
While
VegaLite.savespec("large_plot.json", t_plt; include_data=true, indent=4)
stat("large_plot.json").size / 1024 / 1024
4.101337432861328
It's not that crucial for this case, but in my real data, 22 megabyte plot really slows down Jupyter. And removing indents reduce its size to 8 megabytes.
Just for clarity, in the example I do rounding because unnecessarily high precision is another source of large memory consumption and without rounding the same plots takes 2.2M.
I think we don't print any indents in the vega and vega-lite MIME show methods, right? I might misremember, though... Those would be the most relevant places, right?
Also, this might have changed on master
, where I rewrote the whole JSON generation part. Could you try your Jupyter notebook example again? I think the embedded JSON should not use indents in that case, but would be great if you could double check!
Thanks for the update! In the previous version I explicitly checked that spaces are printed in the notebook. Indeed, huge notebook size was exactly the reason I started looking on such things. Will try to test the new version later this week. Thanks!
Hi @davidanthoff, Sorry for the delay. The situation didn't change.
Package version: "[112f6efa] VegaLite v1.0.1-DEV #master (https://github.com/queryverse/VegaLite.jl.git)". Part of the output:
{
"d": 1.4531611032316108e-05,
"distribution": "LogNormal",
"scale": 3,
"x": 1179.449821080336
},
{
"d": 1.3942453353616595e-05,
"distribution": "LogNormal",
"scale": 3,
"x": 1208.9360602096724
},
Could you provide a bit more information about context etc? I assume you have a Jupyter notebook and run some code in there? What exactly are you calling? Is the problem the size of the JSON embedded as a MIME bundle in the notebook file? Something else?
There are so many different places where the JSON can show up that right now it is not clear to me where exactly this problem is showing up. I think just more context would help.