fury icon indicating copy to clipboard operation
fury copied to clipboard

[RFC][Java] Implement a highly optimized String compressioner

Open chaokunyang opened this issue 1 year ago • 1 comments

Is your feature request related to a problem? Please describe.

Currently utf8 encoding in jdk are not efficient enough and write it to fury MemoryBuffer needs an extra copy since those encoder return a byte array:

  • java.lang.String#encodeUTF8/java.lang.String#encodeUTF8_UTF16
  • sun.nio.cs.UTF_8.Encoder#encodeArrayLoop

We need a highly optimized encoder for fury to avoid the cost of string serialization.

Describe the solution you'd like

  • Detect latin length first using superword, then encode those chars by latin, encode remaining chars using utf8
  • Latin length encoded in coder too, remaining utf8 chars length encoded using varint.

chaokunyang avatar Oct 06 '23 14:10 chaokunyang

hello,I am new in there,please assign me,thanks~

heliang666s avatar Apr 16 '24 00:04 heliang666s