BinarySerializer icon indicating copy to clipboard operation
BinarySerializer copied to clipboard

About strings and arrays

Open rikimaru0345 opened this issue 8 years ago • 6 comments

In a protocol I am reading some arrays and strings are prefixed by one byte that describes the length (number of entries in the array, or number of bytes that make up the string).

There is a special case though. 0xFF means null. 0x0 means empty string.

Same thing for arrays, 0x0 for empty array, 0xFF for null.

There's also another special case, some fields have a 7-bit encoded length prefix. (those fields can not be null of course)

Is there any elegant way to decorate a class so this can be read? Or do I have to write my own converter or something? I'd like to avoid a custom converter thing if possible. But if that's needed, how would I do it?

edit: I know how to encode and decode 7-bit integers, but I'm not sure what class to inherit from (if any?) to write a custom converter. And how to register that converter on a field. Or maybe even for all string/T[] fields inside an object. Is that possible? Or what attribute

I looked through all the attributes and everywhere where I can specify a Converter, the attribute also has some other side-effect. For example FieldEndianness and FieldOffset. There is a section about "Custom Serialization" but how would that work? Would I have to declare the field type as Varuint (name taken from the example on the readme page)?? That would be a little problematic for me because I also want to use the same object in the context of a different serializer (Json) which would have trouble dealing with a different type. So in the worst case I'd have to write two type converters (one for BinarySerializer, and one for the JsonSerializer I'm using).

Is there some way to deal with this encoding problem (7bit encoded length) and the "special meanings" (0xFF for null) in a clean way?

rikimaru0345 avatar Dec 22 '17 04:12 rikimaru0345

Hm, I can't think of any easy way without a converter or custom object. The zero byte case should just work, but the 0xFF would yield a string with length 256. You could always add post-processing that null'd out strings with length 256 I suppose but that's not too elegant.

The converter approach would be nice but might not be possible because of the null case. Let me look at it.

jefffhaynes avatar Dec 22 '17 04:12 jefffhaynes

You could always add post-processing that null'd out strings with length 256 I suppose I don't think that would work, BinarySerializer would read 256 bytes when it should read 0. That would cause it to read into other objects and then the whole deserialization would get messed up, or not?

Or do you mean that some converter or something would then just reset the read position to continue from (so it read some garbage data, and then some other code resets the read position back) ?

Even so, reading more than it should read would potentially even crash when the stream has no more data. Maybe I'm misunderstanding.

The converter approach would be nice but might not be possible because of the null case. Let me look at it.

Ok, I see, so a new attribute like [TypeConverter(...)] where a user can just specify their own typeconverter without also having to set FieldEndianess or FieldOffset?

Thanks for taking the time! :)

rikimaru0345 avatar Dec 22 '17 04:12 rikimaru0345

The issue is that null lengths don't really make sense when dealing with serialization. If you ask the question, how long is this field?, the answer can't be "null". The field or entire field object can be null, but the length can't be null, it can only be zero. So the issue would seem to be the overloading of the length value and I'm trying to wrap my head around that.

jefffhaynes avatar Dec 22 '17 05:12 jefffhaynes

Other serializers expose some way to deal with a "Formatter" or "CustomSerializer".

And then you'd just decorate your field like this:

[TypeConverter(typeof(MySpecialStringFormatter))] public string Name;

and inside MySpecialStringFormatter you have to implement two methods, one for serialization, one for deserialization.

And inside them you just get access to the raw stream, and you have to return the correct object. Just like your IBinarySerializable but instead of having to write the serialized element to a member field, you have to return it (your Deserialize method returns void in your example, but what if it would just return object?

And in Serialize you'd give the value to be serialized to the converter (instead of having to rely on a field).

Or maybe those changes aren't needed, the custom serializer would just set the "string Value;" field to null when it reads 0xFF as its first byte.

Another way to think about it is to not see the length-field and string data as separate things, but one "composite" thing. If the string is null the serializer just writes 0xFF and thats it, if it is "" it writes 0x00, ...

The length itself is never null, it always gets written, but depending on its value a the data that comes after it gets read in a different way.

Not sure if that helps...

rikimaru0345 avatar Dec 22 '17 05:12 rikimaru0345

I would just define this and use it for each field that follows that format:

public class CustomField : IBinarySerializable
{
    private const byte NullValue = 0xff;

    public string Value { get; set; }

    public void Serialize(Stream stream, BinarySerialization.Endianness endianness, BinarySerializationContext serializationContext)
    {
        if (Value == null)
        {
            stream.WriteByte(NullValue);
        }
        else
        {
            var data = System.Text.Encoding.UTF8.GetBytes(Value);
            stream.WriteByte((byte) data.Length);
            stream.Write(data, 0, data.Length);
        }
    }

    public void Deserialize(Stream stream, BinarySerialization.Endianness endianness, BinarySerializationContext serializationContext)
    {
        var length = stream.ReadByte();

        if (length == NullValue)
        {
            Value = null;
        }
        else
        {
            var data = new byte[length];
            stream.Read(data, 0, data.Length);
            Value = System.Text.Encoding.UTF8.GetString(data);
        }
    }
}

For variable length fields you can simply decorate them with [SerializeAs(SerializedType.LengthPrefixedString)].

Let me know if that works.

jefffhaynes avatar Dec 22 '17 05:12 jefffhaynes

Just saw your comment. You could possibly override a value attribute...I need to think about that

jefffhaynes avatar Dec 22 '17 05:12 jefffhaynes