confluent-kafka-dotnet icon indicating copy to clipboard operation
confluent-kafka-dotnet copied to clipboard

Convert from avro logical types to .net equivalents (and viceversa)

Open wilmerferreira opened this issue 7 years ago • 7 comments

Description

I'm trying to find out a way to use the avro logical types, behind the scene the avrogen tool creates a byte[] property (for decimals) and couldn't find a way to convert the .net decimals into this.

Can you please help me with this?

How to reproduce

  1. Create a schema file, e.g. User.avsc

    [
      {
        "namespace": "Confluent.Kafka.Examples.AvroSpecific",
        "type": "record",
        "name": "User",
        "fields": [
          {
            "name": "name",
            "type": "string"
          },
          {
            "name": "favorite_number",
            "type": [ "int", "null" ]
          },
          {
            "name": "favorite_color",
            "type": [ "string", "null" ]
          },
          {
            "name": "birth_date_raw",
            "type": {
              "type": "int",
              "logicalType": "date"
            }
          },
          {
            "name": "salary_raw",
            "type": {
              "type": "bytes",
              "logicalType": "decimal",
              "precision": 19,
              "scale": 4
            }
          }
        ]
      }
    ]
    
  2. Run the avrogen tool in the console.

    avrogen.exe -s .\User.asvc .
    
  3. See the generated property in the User.cs (generated class)

    public byte[] salary_raw
    {
       get
       {
          return this._salary_raw;
       }
       set
       {
          this._salary_raw = value;
       }
    }
    

Checklist

Please provide the following information:

  • [ ] Confluent.Kafka nuget version:
  • [ ] Apache Kafka version:
  • [ ] Client configuration:
  • [ ] Operating system:
  • [ ] Provide logs (with "debug" : "..." as necessary in configuration)
  • [ ] Provide broker log excerpts
  • [ ] Critical issue

wilmerferreira avatar Oct 16 '18 16:10 wilmerferreira

Avro logical types are not supported by the .NET avro library (they are only supported in java AFAIK). I have not tried to do this, but you could perhaps try augmenting the classes generated using avrogen with your own serialization / deserialization in some way. Marking this as enhancement as we should provide guidance on the best way to do this.

mhowlett avatar Oct 17 '18 18:10 mhowlett

I guess that this request should be submitted in the apache avro repository, instead this one given that the classes are generated by the avrogen tool.

wilmerferreira avatar Oct 18 '18 13:10 wilmerferreira

I found this fork of the Apache Avro repository that has some initial logicalType support built in that might be able to be cleaned up and used as a starting point.

noderat avatar Oct 22 '18 21:10 noderat

Ah, that was me trying to put support for logical types in place. Unfortunately, the Avro codebase was quite old and not very friendly (eg, the Apache project was not even using github when I tried, and the repo linked above was just a mirror). Since I was quite in a rush at the time, I simply created an helper method that translates byte[] to decimal according to the original Java specification:

// http://www.ralbu.com/post/2013/07/19/BigInteger-in-Java-and-C // c# The individual bytes in the value array should be in little-endian order, from lowest-order byte to highest-order byte. public static decimal GetDecimalFromByteArray(byte[] bytes) { var result = new BigInteger(bytes.Reverse().ToArray()); return (decimal) result * (decimal)Math.Pow(10, -8); }

The above method just assumes a fixed magnitude, which was the case at the time (avro messages encoded by debezium coming from a mysql/maria db, with decimals all set to (20,8) in the db). The only huge pain for me was that I needed to know which fields were decimals before using them. Whether you use Avro GenericRecords or you use the avro gen tool, you loose in any case information about the logical type which could be used to decide whether the field is a real decimal or not (there is another open issue about this https://github.com/confluentinc/confluent-kafka-dotnet/issues/636). If you are using the avro gen tool, you can probably adapt the code above in some way, but in our case we went for GenericRecords and just used an helper method when we had to read decimals values (really few cases, just a couple of instances). This approach doesn't obviously scale very well and could quickly become a pain to mantain, it would be better to have "native" decimals.

tl;dr You probably shouldn't build on the (failed) attempt above.

manuel-zulian avatar Oct 31 '18 16:10 manuel-zulian

In case this might help: where s is the avro encoded decimal, and you know the original scale - using @manuel-zulian 's method:

public static decimal GetDecimalFromBase64String(string s, int scale)
{
   var bytes = Convert.FromBase64String(s);
   var result = new BigInteger(bytes.Reverse().ToArray());
   return (decimal)result * (decimal)Math.Pow(10, -scale);
}

codeclash avatar Jun 12 '19 14:06 codeclash

It's ok as a workaround but this actually should be done by the generated class also because there're other logical types (like datetime)

https://avro.apache.org/docs/1.8.0/spec.html#Logical+Types

wilmerferreira avatar Jun 13 '19 11:06 wilmerferreira

Any news regarding this issue? I am myself new to Avro and would like to generate a C# model based on a Avro schema .asvc file using the avrogen tool. One of the properties defined in the schema has a 'timestamp-milis' logical type that I would like converted to a C# type.

https://avro.apache.org/docs/1.8.0/spec.html#Timestamp

stefan-benjamin avatar Sep 27 '19 13:09 stefan-benjamin