stringi
stringi copied to clipboard
stri_encode: maximal supported size of a single string is ~0.67 GB
To support the full 2^31-1 (*) bytes per string, the stri_encode
would need to be rewritten using batch uconv processing.
(*) R Internals:
Elements of character vectors (CHARSXPs) remain limited to 2^31 - 1 bytes.
> library(stringi); x<-stri_dup("a", 2**30); y <- paste(c(x, x), collapse="")
#Error in paste(c(x, x), collapse = "") : result would exceed 2^31-1 bytes
Some code to play with:
Rscript -e '
x <- charToRaw(stringi::stri_dup("a", 2**30))
print(length(x))
print(length(x)/1024/1024)
y <- stringi::stri_encode(x, NULL, "utf-8")
stopifnot(identical(rawToChar(x), y))
'