VegaLite.jl icon indicating copy to clipboard operation
VegaLite.jl copied to clipboard

Using indent in printrepr leads to unnecessary large plots

Open VPetukhov opened this issue 5 years ago • 4 comments

For plots that have a lot of data, using indents in printrepr lead to increase in plot size up to 4 times on my tests. One example:

t_plt = DataFrame(round.(rand(50000, 2), sigdigits=3)) |> @vlplot(x=:x1, y=:x2, mark=:point);
VegaLite.savespec("large_plot.json", t_plt; include_data=true)
stat("large_plot.json").size / 1024 / 1024

1.1441984176635742

While

VegaLite.savespec("large_plot.json", t_plt; include_data=true, indent=4)
stat("large_plot.json").size / 1024 / 1024

4.101337432861328

It's not that crucial for this case, but in my real data, 22 megabyte plot really slows down Jupyter. And removing indents reduce its size to 8 megabytes.

Just for clarity, in the example I do rounding because unnecessarily high precision is another source of large memory consumption and without rounding the same plots takes 2.2M.

VPetukhov avatar Dec 29 '19 21:12 VPetukhov

I think we don't print any indents in the vega and vega-lite MIME show methods, right? I might misremember, though... Those would be the most relevant places, right?

Also, this might have changed on master, where I rewrote the whole JSON generation part. Could you try your Jupyter notebook example again? I think the embedded JSON should not use indents in that case, but would be great if you could double check!

davidanthoff avatar Jan 14 '20 03:01 davidanthoff

Thanks for the update! In the previous version I explicitly checked that spaces are printed in the notebook. Indeed, huge notebook size was exactly the reason I started looking on such things. Will try to test the new version later this week. Thanks!

VPetukhov avatar Jan 14 '20 21:01 VPetukhov

Hi @davidanthoff, Sorry for the delay. The situation didn't change.

Package version: "[112f6efa] VegaLite v1.0.1-DEV #master (https://github.com/queryverse/VegaLite.jl.git)". Part of the output:

          {
           "d": 1.4531611032316108e-05,
           "distribution": "LogNormal",
           "scale": 3,
           "x": 1179.449821080336
          },
          {
           "d": 1.3942453353616595e-05,
           "distribution": "LogNormal",
           "scale": 3,
           "x": 1208.9360602096724
          },

VPetukhov avatar Jan 26 '20 16:01 VPetukhov

Could you provide a bit more information about context etc? I assume you have a Jupyter notebook and run some code in there? What exactly are you calling? Is the problem the size of the JSON embedded as a MIME bundle in the notebook file? Something else?

There are so many different places where the JSON can show up that right now it is not clear to me where exactly this problem is showing up. I think just more context would help.

davidanthoff avatar Jan 28 '20 04:01 davidanthoff