paws
paws copied to clipboard
use cpp for url encoding
Codecov Report
Merging #518 (202b6c0) into main (c0b0866) will increase coverage by
0.02%
. The diff coverage isn/a
.
:exclamation: Current head 202b6c0 differs from pull request most recent head 441d2c8. Consider uploading reports for the commit 441d2c8 to get more accurate results
@@ Coverage Diff @@
## main #518 +/- ##
==========================================
+ Coverage 82.68% 82.71% +0.02%
==========================================
Files 112 57 -55
Lines 6400 3234 -3166
==========================================
- Hits 5292 2675 -2617
+ Misses 1108 559 -549
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.
Looks good to me. My two comments require no follow up and I think this is good to go.
Possible future iteration for the encoder :) sweet I will merge it in now :)
hey guys - I know this is closed, but I'd just want to chime in that given how small this implementation is, I'd def suggest going with cpp11 instead of Rcpp - This is listed out in the motivations https://cpp11.r-lib.org/articles/motivations.html#header-only
but simply - it should be lighter, faster, and given its header only it will avoid any chance at problems for linking issues when upgrading versions.
Realistically, this might not have a significant impact, but as you might move other code hot paths into cpp could get out in front of this now.
Just my two cents!
Hi @dpastoor thanks for letting us know. I am still fairly new in adding cpp into R packages. I have done some initial benchmarks around rcpp and cpp11 performance of the encoder. To do this I have converted the existing implementation https://github.com/DyfanJones/EncodeCpp11 (I am not sure if I have added any overhead to this).
Here is the initial benchmark.
library(ggplot2)
string <- "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789-._~`!@#$%^&*()=+[{]}\\|;:'\",<>? "
iter <- 1:7
# Note: python is at a disadvantage here as it needs to convert it's output from python to R.
parse <- reticulate::import("urllib.parse")
bm <- list()
for (i in iter) {
url <- paste(sample(strsplit(string, "")[[1]], (10 ^ i), replace = T), collapse = "")
bm[[i]] <- bench::mark(
url_encode_rcpp = paws.common:::paws_url_encoder(url),
url_encode_cpp11 = EncodeCpp11::cpp11_url_encoder(url),
url_encode_curl = curl::curl_escape(url),
url_encode_python = parse$quote(url),
iterations = 100
)
}
result <- do.call(rbind, bm)
result$size <- sort(rep(10 ^ iter, 4))
result$expression <- as.character(result$expression)
res <- tidyr::unnest(result, c(time, gc))
ggplot(res, aes(time, expression)) +
geom_violin() +
facet_wrap("size", scales = "free_x") +
theme(
# text = element_text(size = 8),
axis.text.x = element_text(angle = 20, vjust = 0.5, hjust=1)
)
Created on 2022-09-01 with reprex v2.0.2
I am not seeing any noticeable performance improvements. I think there is a slight performance decrease, however this could be my conversion (from Rcpp to cpp11) and not cpp11 itself. From my understanding of cpp11 has compile time and memory improvements over Rcpp.
I do like to keep paws as light as possible.
Full benchmarks can be found here: https://github.com/DyfanJones/EncodeCpp11/blob/main/inst/ Practise package (EncodeCpp11) for cpp11 encoder: https://github.com/DyfanJones/EncodeCpp11