paws icon indicating copy to clipboard operation
paws copied to clipboard

use cpp for url encoding

Open DyfanJones opened this issue 1 year ago • 1 comments

DyfanJones avatar Aug 09 '22 14:08 DyfanJones

Codecov Report

Merging #518 (202b6c0) into main (c0b0866) will increase coverage by 0.02%. The diff coverage is n/a.

:exclamation: Current head 202b6c0 differs from pull request most recent head 441d2c8. Consider uploading reports for the commit 441d2c8 to get more accurate results

@@            Coverage Diff             @@
##             main     #518      +/-   ##
==========================================
+ Coverage   82.68%   82.71%   +0.02%     
==========================================
  Files         112       57      -55     
  Lines        6400     3234    -3166     
==========================================
- Hits         5292     2675    -2617     
+ Misses       1108      559     -549     
Impacted Files Coverage Δ
paws/paws.common/R/url.R 90.90% <0.00%> (-1.40%) :arrow_down:
Users/runner/work/paws/paws/paws.common/R/net.R
...sers/runner/work/paws/paws/make.paws/R/templates.R
...rs/runner/work/paws/paws/paws.common/R/signer_v2.R
Users/runner/work/paws/paws/make.paws/R/docs.R
Users/runner/work/paws/paws/paws.common/R/time.R
...rs/runner/work/paws/paws/paws.common/R/signer_v4.R
...rs/runner/work/paws/paws/paws.common/R/signer_s3.R
Users/runner/work/paws/paws/make.paws/R/service.R
...er/work/paws/paws/paws.common/R/handlers_jsonrpc.R
... and 49 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

codecov[bot] avatar Aug 09 '22 14:08 codecov[bot]

Looks good to me. My two comments require no follow up and I think this is good to go.

davidkretch avatar Aug 31 '22 22:08 davidkretch

Possible future iteration for the encoder :) sweet I will merge it in now :)

DyfanJones avatar Aug 31 '22 22:08 DyfanJones

hey guys - I know this is closed, but I'd just want to chime in that given how small this implementation is, I'd def suggest going with cpp11 instead of Rcpp - This is listed out in the motivations https://cpp11.r-lib.org/articles/motivations.html#header-only

but simply - it should be lighter, faster, and given its header only it will avoid any chance at problems for linking issues when upgrading versions.

Realistically, this might not have a significant impact, but as you might move other code hot paths into cpp could get out in front of this now.

Just my two cents!

dpastoor avatar Sep 01 '22 03:09 dpastoor

Hi @dpastoor thanks for letting us know. I am still fairly new in adding cpp into R packages. I have done some initial benchmarks around rcpp and cpp11 performance of the encoder. To do this I have converted the existing implementation https://github.com/DyfanJones/EncodeCpp11 (I am not sure if I have added any overhead to this).

Here is the initial benchmark.

library(ggplot2)
string <- "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789-._~`!@#$%^&*()=+[{]}\\|;:'\",<>? "
iter <- 1:7

# Note: python is at a disadvantage here as it needs to convert it's output from python to R.
parse <- reticulate::import("urllib.parse")

bm <- list()
for (i in iter) {
  url <- paste(sample(strsplit(string, "")[[1]], (10 ^ i), replace = T), collapse = "")
  bm[[i]] <- bench::mark(
    url_encode_rcpp = paws.common:::paws_url_encoder(url),
    url_encode_cpp11 = EncodeCpp11::cpp11_url_encoder(url),
    url_encode_curl = curl::curl_escape(url),
    url_encode_python = parse$quote(url),
    iterations = 100
  )
}

result <- do.call(rbind, bm)
result$size <- sort(rep(10 ^ iter, 4))
result$expression <- as.character(result$expression)

res <- tidyr::unnest(result, c(time, gc))
ggplot(res, aes(time, expression)) +
  geom_violin() +
  facet_wrap("size", scales = "free_x") +
  theme(
    # text = element_text(size = 8),
    axis.text.x = element_text(angle = 20, vjust = 0.5, hjust=1)
  )

Created on 2022-09-01 with reprex v2.0.2

I am not seeing any noticeable performance improvements. I think there is a slight performance decrease, however this could be my conversion (from Rcpp to cpp11) and not cpp11 itself. From my understanding of cpp11 has compile time and memory improvements over Rcpp.

I do like to keep paws as light as possible.

Full benchmarks can be found here: https://github.com/DyfanJones/EncodeCpp11/blob/main/inst/ Practise package (EncodeCpp11) for cpp11 encoder: https://github.com/DyfanJones/EncodeCpp11

DyfanJones avatar Sep 01 '22 14:09 DyfanJones