fury
fury copied to clipboard
[RFC][Java] Implement a highly optimized String compressioner
Is your feature request related to a problem? Please describe.
Currently utf8 encoding in jdk are not efficient enough and write it to fury MemoryBuffer needs an extra copy since those encoder return a byte array:
-
java.lang.String#encodeUTF8
/java.lang.String#encodeUTF8_UTF16
-
sun.nio.cs.UTF_8.Encoder#encodeArrayLoop
We need a highly optimized encoder for fury to avoid the cost of string serialization.
Describe the solution you'd like
- Detect latin length first using
superword
, then encode those chars by latin, encode remaining chars using utf8 - Latin length encoded in coder too, remaining utf8 chars length encoded using varint.
hello,I am new in there,please assign me,thanks~