fury icon indicating copy to clipboard operation
fury copied to clipboard

[Java] fast serialization path for String KV map

Open chaokunyang opened this issue 10 months ago • 4 comments

Feature Request

Map<String, String> is very common in java, we can provide a fast serialization path for it to provide faster performance.

Is your feature request related to a problem? Please describe

No response

Describe the solution you'd like

Add a StringMapSerialization util class in org.apache.fury.serializer.collection package:

class StringMapSerialization {
  /**
   * Write string chunk until there isn't any entry left.
   */
  public static void writeStringChunks(
    MemoryBuffer buffer,
    Entry<String, String> entry,
    Iterator<Entry<String, String>> iterator) {

  }
  /**
   * Write string chunk until there isn't any entry left or chunk size reached max value..
   */
  public static Entry<String, String> writeStringChunk(
    MemoryBuffer buffer,
    Entry<String, String> entry,
    Iterator<Entry<String, String>> iterator) {

  }

  /**
   * Write string chunk until next entry is not string type.
   */
  public static Entry writeChunk(
    MemoryBuffer buffer,
    Entry<String, String> entry,
    Iterator<Entry> iterator) {

  }

  /**
   * Read all string kv chunks and put it into map until all chunks are read.
   */
  public static void readChunks(
    MemoryBuffer buffer, Map<String, String> map, long size, int chunkHeader) {

  }

  public static int readChunk(
    MemoryBuffer buffer, Map<String, String> map, long size, int chunkHeader) {

  }
}

Add fast path in AbstractMapSerializer to forward implementation into StringMapSerialization

Describe alternatives you've considered

No response

Additional context

#2025

chaokunyang avatar Jan 26 '25 12:01 chaokunyang

Hi @chaokunyang , I'm interested in this issue and would like to try it. However, I'm unable to determine the key difference between the path of the StringMapSerialization and AbstractMapSerializer. If we could declare the key type and value type, will it be as fast as StringMapSerialization?

jayhan94 avatar Feb 08 '25 01:02 jayhan94

@jayhan94 The AbstractMapSerializer handles serialization for different types of maps, where key and value types can vary from one map to another. This variability prevents the JVM's Just-In-Time (JIT) compiler from effectively inlining the read and write methods for key and value serializers. However, maps with string keys and values are so prevalent that they merit a special code path to facilitate JIT inlining.

An alternative approach involves adding a fast path before invoking the read and write methods for key and value serializers within AbstractMapSerializer. This optimization could also improve performance. Maybe we could follow this way

chaokunyang avatar Feb 09 '25 16:02 chaokunyang

As I understand it, the virtual function call prevents JIT from inlining the serializer.write/read methods for the key and value types. If we were able to declare the key/value serializers like StringSerializer stringSerializer, JIT would be able to inline them. Is that correct? @chaokunyang

jayhan94 avatar Feb 12 '25 11:02 jayhan94

Yes, you are right.

chaokunyang avatar Feb 12 '25 15:02 chaokunyang