svglite
svglite copied to clipboard
Performance improvements?
Just a reminder to investigate if there are any venues open for improving performance
With httpgd I had massive performance improvements after switching from the stringstream formatting to fmt. I think most of the performance difference in httpgd comes from that. It's currently more than twice as fast.
Thanks for the pointer. I tried playing with fmt but did not see any immediate performance improvements. This is potentially because a lot of the writing is broken up into smaller pieces due to the architecture of svglite...
So, it is possible that a speed gain could be had if each tag were written in one go using fmt, but as the number of style elements and attributes is variable the logic for creating the formatting string will end up being quite horrible
Interesting, yes I think instruction cache misses could be a part of it. I am collecting all the draw call data in a vector and render them in one go.
do you have a pointer to where you are doing this?
Sure, HttpgdDev.cpp
contains all the graphics device callbacks and creates DrawCall
objects. These are defined in DrawData.h
(rendering logic is in DrawData.cpp
).
I save them on the heap in a thread save data structure (HttpgdDataStore.h
), so that my webserver threads can access them without synchronizing with R.
They are organized in "pages" with each page being one plot. Rendering is initiated in Page::build_svg
.
Edit: If you were wondering: devGeneric.h
is my attempt at creating an abstraction layer for graphics devices and handles registering all the callbacks and device creation.
I wonder if the main speed-up comes from the fact that you are formatting directly into the ostream object... Due to the structure of svglite I have to format into a string buffer first and then write the buffer to the final stream...
So, I figured out the main difference between the performance of svglite and httpgd.
It has nothing to do with string formatting or allocations or anything like that. svglite maintains a valid svgfile at all times which means that it always closes the open tags and then rewinds the stream position. The rewinding makes up half the time of the plotting which equals to exactly the performance difference
That does make a lot of sense.
I just re-ran the benchmark with the current development version of svglite.
httpgd actually is now only slightly faster, as it seems to have gotten a lot slower in a recent update.
So it seems like I have got work to do... I will let you know If I find any optimizations that svglite could benefit from in the future.
Thanks - I ran the httpgd benchmarks as part of all this and httpgd was consistently twice as fast as svglite, but that disappeared with the removal of the stream seek call. I'm going to make this optional as that feature in itself is quite niche and can't justify the performance toll...
You should look into the new text rendering setup in svglite and update httpgd to match it as it open up a lot of new text features
Must be caused by my setup then. I will look into integrating the benchmark in the CI for it to be more consistent.
Thanks for the tip! I will do that.
I have been doing some more optimization and benchmarking (now only calling in-memory functions to not be bottlenecked by disk writes) and noticed that httpgd is still exponentially faster than svglite:
I think this is most likely caused by svgstring fixing always_valid
to true:
https://github.com/r-lib/svglite/blob/654ab4a24f5c5d70caf156641390d5f92a706874/src/devSVG.cpp#L1064-L1072
Is there a reason for this?
Benchmark code
library(svglite)
library(httpgd)
# Benchmark: Time to plot
results <- bench::press(
pts = 2^(0:18),
{
set.seed(1234)
x <- runif(pts)
y <- runif(pts)
svglite_test <- function() {
stringSVG({
plot(x, y)
})
}
httpgd_test <- function() {
hgd_inline({
plot(x, y)
})
}
bench::mark(httpgd_test(), svglite_test(), iterations = 128, check = FALSE)
}
)
# Benchmark: SVG size
df <- data.frame(pts = 2^(0:18))
df$pts
df["svglite_test()"] <- vapply(df$pts, function(i) {
set.seed(1234)
x <- runif(i)
y <- runif(i)
nchar(stringSVG({
plot(x, y)
}))},
numeric(1)
)
df["httpgd_test()"] <- vapply(df$pts, function(i) {
set.seed(1234)
x <- runif(i)
y <- runif(i)
nchar(hgd_inline({
plot(x, y)
}))},
numeric(1)
)
# Merge data
df <- tidyr::pivot_longer(df,c("svglite_test()", "httpgd_test()"), names_to = "expression", values_to = "chars")
results$expression <- as.character(results$expression)
df <- dplyr::inner_join(df, results)
df$mem_alloc <- as.numeric(df$mem_alloc)
dfmem <- tidyr::pivot_longer(df,c("mem_alloc", "chars"), names_to = "mem_type", values_to = "mem_val")
# Plot results
g1 <- ggplot(df, aes(x=pts, y=as.numeric(median), colour=expression)) +
scale_x_log10(name = 'number of plot points',
breaks = 10^(0:5),
labels = function(x) format(x, scientific = FALSE)) +
scale_y_log10(name = 'time to plot (sec)') +
scale_colour_discrete(name = '', labels=list(`svglite_test()`="svglite", `httpgd_test()`="httpgd")) +
geom_point() +
geom_line() +
theme_bw() +
annotation_logticks() +
theme(legend.position="bottom")
g2 <- ggplot(dfmem, aes(x=pts, y=mem_val/1024, colour=expression, shape=mem_type)) +
scale_x_log10(name = 'number of plot points',
breaks = 10^(0:5),
labels = function(x) format(x, scientific = FALSE)) +
scale_y_log10(name = 'size (KB)') +
scale_shape_discrete(name = '', labels=list(chars="SVG size", mem_alloc="allocated memory")) +
scale_colour_discrete(name = '', labels=list(`svglite_test()`="svglite", `httpgd_test()`="httpgd")) +
geom_point() +
geom_line() +
theme_bw() +
annotation_logticks() +
theme(legend.position="bottom", legend.box="vertical", legend.margin=margin())
gridExtra::grid.arrange(g1, g2, ncol = 2)
The two devices svglite
and svgstring
works fundamentally different and there is no effect of setting always_valid
in svgstring
. The validity of svgstring
is only ensured when the actual string is requested and not during writing... My guess is that the file stream is simply more performant that the string stream
While I have changed away from stringstream in a recent commit, this is what it looked like when httpgd still used stringstream:
(There were other optimizations in between, thats why the svg size is larger in the righthand plot)
Then it is something else :-) maybe the formatting you mentioned earlier... but it is not related to always_valid
Yes good to know, thanks for the responses. I primarily wanted to share the new benchmarks.
much appreciated - I may look at it closer next time I'm working on svglite
Would you mind sharing your performance comparison setup? I'm a bit unsure how to do a fair comparison with httpgd for pure svg performance
No problem at all, httpgd has a helper function that plots to SVG by default and returns an R string:
library(httpgd)
mysvg <- hgd_inline({
hist(rnorm(100))
})
this basically starts and closes an offline device and is equivalent to:
library(httpgd)
hgd(webserver=F) # start device
hist(rnorm(100)) # plot something
mysvg <- hgd_plot() # render last plot to svg
dev.off() # close device
Keeping the device open and continuously plotting with the same device would have a lower overhead but that should be constant (independent of the number of previous plots) so using hgd_inline
should be fine.
This is the code I used previously for the benchmark, but measuring R overhead with your devoid device is missing: https://github.com/nx10/httpgd/blob/44ccccaa6352ee5a80f43a3d7c79880fce35ad18/docs/benchmark.R
I added an alternative SVG renderer recently that can be set with httpgd::hgd_inline(... , renderer="svgp")
which avoids global styles and conflicting IDs. there is also a "svgz" renderer that does in-memory gzip compression.
All httpgd plots will be returned as memory objects by default, but will be written to disk instead when the file
argument is set to a filepath.