simple-binary-encoding icon indicating copy to clipboard operation
simple-binary-encoding copied to clipboard

[C#,C++] Generate DTOs for non-perf-sensitive usecases.

Open ZachBray opened this issue 1 year ago • 2 comments

Overview

In some applications, performance is not critical. Some users would like to use SBE across their whole "estate" but don't want the "sharp edges" associated with flyweight codecs, e.g., usage not aligning with data lifetimes.

In this PR, I've added DTO generation for C# and C++.

I'm using property-based tests to gain confidence that the DTOs are working correctly. In particular, I'm checking the following property (albeit not exhaustively):

∀ msg ∈ MessageSchemas,
∀ encoding ∈ EncodingsOf(msg),
encoding = dtoEncode(dtoDecode(encoding))

I.e., for any message schema dtoEncode is the inverse of dtoDecode and the "round trip" preserves all information in the original encoding.

These tests run periodically rather than on every commit; however, I've tested out the CI job using a PR hook here.

Implementation notes

The DTOs support encoding and decoding via the generated codecs using static void EncodeWith(CodecT codec, DtoT dto) and static DtoT DecodeWith(CodecT codec) methods.

C# Representations

  • Messages and composites are represented as immutable records.

    • init accessors are provided so that record expressions may be used, e.g., x with { Y = Z }.
    • An all-args constructor is defined to prevent the construction of records with missing fields.
    • The compiler generated ToString() does not show what is inside groups etc.; therefore, we provide ToSbeString() as well.
  • Groups are represented as IReadOnlyList<GroupT>

  • Added/optional primitives are represented as nullable types. null indicates the value is not filled. The reserved null value defined explicitly in the schema or implicitly by the SBE specification is not permitted for use within the DTOs, as this would lead to multiple representations of null in consuming application code. Both constructors and init accessors validate that values are in the allowed range.

  • Added fixed-length data is represented through nullable reference types, e.g., string? and IReadOnlyList<byte>?. Missing data, e.g., due to the encoding version, is represented as null.

  • Missing, added variable-length data is represented as an empty string or array, similarly to the codecs.

  • Enums and bitsets use the existing codec representations, i.e., generated enums.

    • Missing bitset fields are represented as 0.

Other changes

ZachBray avatar Sep 27 '23 16:09 ZachBray

Feedback to consider:

  1. Use records (C# 9+ only) for built-in equality and comparison operations.
  2. Be more idiomatic, e.g., int? rather than int and NullValue, but protect against having two null values in the setters.

ZachBray avatar Oct 04 '23 23:10 ZachBray

@kieranelby, please can someone kick the tyres on your side and let me know if you'd prefer different representations?

To use it, you can supply -Dsbe.csharp.generate.dtos=true to the code generator.

ZachBray avatar Oct 23 '23 11:10 ZachBray