stringi icon indicating copy to clipboard operation
stringi copied to clipboard

stri_encode: maximal supported size of a single string is ~0.67 GB

Open gagolews opened this issue 4 years ago • 1 comments

To support the full 2^31-1 (*) bytes per string, the stri_encode would need to be rewritten using batch uconv processing.

(*) R Internals:

Elements of character vectors (CHARSXPs) remain limited to 2^31 - 1 bytes.

> library(stringi); x<-stri_dup("a", 2**30); y <- paste(c(x, x), collapse="")
#Error in paste(c(x, x), collapse = "") : result would exceed 2^31-1 bytes

gagolews avatar Aug 21 '20 07:08 gagolews

Some code to play with:

Rscript -e '
x <- charToRaw(stringi::stri_dup("a", 2**30))
print(length(x))
print(length(x)/1024/1024)
y <- stringi::stri_encode(x, NULL, "utf-8")
stopifnot(identical(rawToChar(x), y))
'

gagolews avatar Aug 24 '20 07:08 gagolews