Hyperion icon indicating copy to clipboard operation
Hyperion copied to clipboard

Modify the way, how type information is stored.

Open Horusiath opened this issue 7 years ago • 1 comments

Motivation

When working with Distributed Data, I came to the point when we need to serialize generic data types in efficient way - both in terms of performance and payload size. Protobuf is a no-go here: it's not able to store generic data. JSON is inefficient. Hyperion can deal with generics efficiently, but there's a catch.

Problem

Right now we are storing type information as a string. As someone pointed out this may be a security issue (we have the same for json.net). Regardless, there is another problem with storing types this way, and it's related to generics. This is something I need to verify, but the deal is:

Potentially we can make use of known types property of Hyperion to safe space (I'm not sure, but AFAIK type name is changed to int identifier). However known types won't be able to utilize this feature if we have generic type, because we need to have type written as a single value. Therefore having known type of Dictionary<,> won't help us anywhere if we have to serialize Dictionary<string, int>/Dictionary<string,string> etc. as their type signatures don't match the one of known type.

Solution?

We could deal with it this way:

Instead of setting known type as a single key, in case of generics we could serialize it as a sequence of keys, i.e. 3-key sequence representing Dictionary<,>/string/int which would be then composed into Dictionary<string,int>. Each key could be cached separately.

This is only an idea and needs further investigation.

Horusiath avatar Apr 27 '17 05:04 Horusiath

Further thoughts: we could always represent types as i.e. int identifiers (computed as consistent hash of fully qualified type name with assembly without version). If type is not known, send it only once per serializer/deserializer session in form of type_name=id map. This could also partially mitigate security issues send to us.

Case 1: type Point { x: int; y: int } could be stored as:

  • 4 bytes: identifier of type (Point)
  • 4 bytes (or less): identifier of type of the first field x (int)
  • 4 bytes: value of field x
  • 4 bytes (or less): identifier of type of the second field y (int)
  • 4 bytes: value of field y

this gives us up to 5 * 4 = 20 bytes in total.

Case 2 generic type KeyValuePair<int, int> { Key: int; Value: int }.

  • 4 bytes: identifier of type (KeyValuePair<,>) - since deserializer side known the generics arrity, it knows that the next 2 identifiers are generic parameters of this type.
  • 4 bytes: identifier of the first generic argument type (in this case int)
  • 4 bytes: identifier of the second generic argument type (in this case int)
  • 4 bytes (or less): identifier of type of the first field Key (int)
  • 4 bytes: value of field Key
  • 4 bytes (or less): identifier of type of the second field Value (int)
  • 4 bytes: value of field Value

this gives us up to 7 * 4 = 28 bytes in total.

There are also room for potential optimizations:

  • Type identifiers for primitive types doesn't have to use full 4 bytes, as Wire/Hyperion may threat them in special way i.e. if leading bit is 0 (so id is positive) we use full 4 bytes as id (consistent hash of the type name computed to positive int), if it's 1 (so id is negative) we could treat it as special case (i.e. we could fit up to 128 well known primitive types like int, string etc.). AFAIK current implementations of Wire/Hyperion already do such trick. Potential reduction of size: 20 - 6 = 14 bytes in first case, 28 - 12 = 16 bytes in second case.
  • If we know that type of the field is sealed (either value type or sealed class), we don't need to prepend type id for every single field, as it's needed only for polymorphic deserialization. Having fully constructed top type id is the only requirement in this case. Further reduction of size: 14 - 2 = 12 bytes in first case, 16 - 2 = 14 bytes in second case.

What do you think @rogeralsing ?

Horusiath avatar May 10 '17 11:05 Horusiath