gogen-avro
gogen-avro copied to clipboard
type matching should match unqualified names
The specification says:
[to match] both schemas are records with the same (unqualified) name
However the code checks the fully qualified names, leading to an "Incompatible types by name" error when the unqualified names match but are part of separate namespaces.
This means that gogen-avro can fail when compiling schemas that should actually match.
Thanks for pointing this out - I've made a small change on master which should resolve this issue. Can you test it out and let me know, and then I can cut a minor release?
That works fine now, thanks very much!
I wonder whether it might be worth making the check less stringent still, to match the Java implementation. You might be interested to read this thread that I started in the avro-user mailing list. I think there are good use cases for allowing any two record types with compatible fields to be considered compatible.
One possibility might be to add a flag to the Compile
function to allow this.
Thanks for checking @rogpeppe! I'd be willing to add an argument to support the Java implementation - where names are only used for union resolution. Extending this to unions as well is a can of worms because:
The first schema in the reader's union that matches the selected writer's union schema is recursively resolved against it
Two record schemas matching only requires that:
if the writer's record contains a field with a name not present in the reader's record, the writer's value for that field is ignored.
The first schema in the reader's union that matches the selected writer's union schema is recursively resolved against it
So a record with fields (a,b) matches a definition with fields (c,d), provided c and d have default values. This would break unions of multiple record types.
In general it seems like a foot-gun (allowing users to deserialize data and throw it away) but if we make it configurable and default to the spec-compliant behaviour I think it's fine.
@actgardner ISTM that it could work OK on unions too if both reader and writer have just a single record type in the union - there's no ambiguity then. That said, unions of records and non-record types are probably relatively rare in practice, so maybe that's not worth doing.
In general it seems like a foot-gun (allowing users to deserialize data and throw it away) but if we make it configurable and default to the spec-compliant behaviour I think it's fine.
Cool, thanks. I might be able to make a PR for this.
Do you have a preference as to the API for the configurability? Currently compiler.Compile
doesn't take any arguments. Some possibilities:
- an attribute in the various definition types that requests the lax name checking behaviour.
- add a flags argument to
compiler.Compile
(but that's backwardly incompatible) - add a parameter struct to
compiler.Compile
(similarly incompatible) - add a new function
compiler.CompileWithLaxNames
(but that doesn't scale well if there turn out to be other compiler options)
ISTM that it might be possible to accomplish the first option by allowing an additional metadata field to the Avro schema for definitions (that's something that's explicitly allowed in the spec). Then users could explicitly mark their definitions as "lax" when checking names. From a brief scan of the IDL spec, I think that this approach might even be compatible with IDL (with a @laxname(true)
) annotation.
This way, you'll only get the "foot-gun" behaviour when you've explicitly requested it.
an attribute in the various definition types that requests the lax name checking behaviour.
I think this is my favourite option, which could hopefully be adopted elsewhere. The way I see it the reader record definition would support an extra field to specify that names should be ignored during deserialization.
add a parameter struct to compiler.Compile (similarly incompatible)
I'd like to make this API change at some point anyways, to add some future-proof mechanism for configuring parameters in the public API. This could be variadic arguments, or a builder pattern config struct, or something else.
Either way I'd be open to a PR for this!
Hey @rogpeppe, this should be resolved thanks to the laxNames option to Compile that merged in #138. Let me know if you run into any edge cases.