goavro icon indicating copy to clipboard operation
goavro copied to clipboard

Is there a way to ignore a property to be serialized if missing from schema?

Open passuied opened this issue 9 months ago • 4 comments

I'm trying to support easier schema evolution so that code with new properties can be deployed before the new schema is published. out of the box, if you have an extra property to be serialized that doesn't exist in your schema, it will through an error cannot determine codec: "{field_name}". Is there a way to make the serialization more lenient and just skip any property that is NOT defined in the schema? In the Pydantic world, it would be equivalent to the extra: ignore property.

I couldn't see a way at first glance. Please advise

passuied avatar Mar 19 '25 19:03 passuied

@jwon I just noticed the mention in the README.md about this repo no longer active. This is bad news. I've tried to move to hamba/avro and they don't have support for Generic Avro ser/de or even textual avro representation. We especially rely on the StandardJSONFull support... Is there a way to keep this repo alive?

passuied avatar Mar 21 '25 17:03 passuied

@mihaitodor and @rockwotj have volunteered to maintain this repo so please reach out to them for community support

jwon avatar Mar 21 '25 20:03 jwon

@mihaitodor @rockwotj Open to contribute to this issue but would need some guidance first... I see that if a field {field_name} is present in the JSON string but not in the schema, NativeFromTextual throws the error: cannot determine codec: "{field_name}" I have traced it in the code to be part of genericMapTextDecoder. Ideally, I would want to add an option to ignore such issues, so that the native object created will just NOT have that property until the schema with support for it is finally published. Do you have guidance where would be the best place to provide such option? Should it be part of the codec builder?

passuied avatar Mar 21 '25 20:03 passuied

Hey @passuied, thank you for raising this issue! I don't have an immediate answer for how this can be achieved, but it does sound like a nice feature to have. I'll do some digging when I'll get a chance, but I can't provide a deadline right now.

mihaitodor avatar Mar 30 '25 02:03 mihaitodor

is there any chance to continue using this pkg? or we should consider going forward with Hamba ?

  1. Hamba shows bench faster than Linkedin's pkg, but the bench itself is little outdated
  2. Linkedin Readme tells us 3-4x performance on v2, but at the same time docs tell that internal of Linkedin switched to Hamba

what we should do regarding these things?

@mihaitodor @rockwotj thanks!

arlanram avatar Mar 31 '25 13:03 arlanram

Evaluating Hamba is certainly something you should do with your own use cases. Benchmarks can be misleading as libraries can be designed for specific use cases in mind. Goavro is quite good for arbitrary data, while Hamba to me seems to be designed for go struct tags primarily (not that other modes don't work, but that seems to be the happy path).

As for if you'd like to contribute here, this is a reasonable feature to add, even if it's outside the spec. My suggestion is to either:

  1. Add extra parameters to the schema that goavro could use to skip serialization.
  2. Mangle the schema before passing it into this library
  3. Add a new API which is NewCodecWithOptions that supports specifying the JSON encoding (using "internet" json or "spec" json), along with the ability to have skip fields. We should be mindful of supporting nested fields and fields in unions, etc. So the best option is likely an optional predicate function that the codec builder calls into while building the encoder.

rockwotj avatar Apr 01 '25 00:04 rockwotj

The biggest issue with Hamba is the lack of support for textual representation. Unless they add support, we cannot switch to it. I tried... I also agree the lack of support for dictionary (generic use case) is limiting for our use case

passuied avatar Apr 02 '25 14:04 passuied