avro
avro copied to clipboard
AVRO-3135: ability to hook into schema ser/deser to implement schema refer…
For details on the rationale of this change please see
PR adds the following unit test: TestSchemaSerializationHooks which should also be a good example for how the functionality can be used.
This would appear to entangle the parsing of the schema with an infrastructure dependency.
For example, two different processes were parsing the same document, and did not have the same reference resolver, or resolved the schema to different things, the actual schema would become non-deterministic.
What is the purpose of the added file lang/s110/java/15/classes/org/apache/avro/specific/test/FullRecordV2$1.class
?
What is the purpose of the added file
lang/s110/java/15/classes/org/apache/avro/specific/test/FullRecordV2$1.class
?
that was unintended.
This would appear to entangle the parsing of the schema with an infrastructure dependency.
For example, two different processes were parsing the same document, and did not have the same reference resolver, or resolved the schema to different things, the actual schema would become non-deterministic.
@adamkennedy this PR adds the ability to hook in a method to resolve a reference.
It is up to the library user to chose to use a resolver or not.
Although different resolvers could resolve a schemas differently. I have not seen this happening in practice in my implementations. (Always used references pointing to immutable content)
One could bring the same argument against logical types. you can have different conversions/implementations registered in different places and non-deterministic behavior.
What is the purpose of the added file
lang/s110/java/15/classes/org/apache/avro/specific/test/FullRecordV2$1.class
?that was unintended.
removed
The thing I like about current .avsc
files is that they are complete. Using external references breaks that, and would require some form of compilation/resolution to name the schema complete again.
Instead, one can also use .avdl
(IDL) files: these support resolving imports from both
- the file system (splitting large schemata in multiple files) and
- the class path (allowing your dependency system to import schemata by version)
The thing I like about current
.avsc
files is that they are complete. Using external references breaks that, and would require some form of compilation/resolution to name the schema complete again.Instead, one can also use
.avdl
(IDL) files: these support resolving imports from both
- the file system (splitting large schemata in multiple files) and
- the class path (allowing your dependency system to import schemata by version)
avsc and avdl are not equivalent, avsc is a data format for schemas while avdl is a format for interfaces/protocols. avdl is not serialization/deserialization friendly. but it's json representation .avpr is.
I understand your point. What about this change enabling a new schema format? let's call it ".ravsc". .avsc remains references free .ravsc introduces references support...
Think about this PR about enabling the ability to implement .ravsc ....
to understand why I think this is worthwhile, see the use cases I described at.
Just a question about proposed code, Why put methods customRead & customWrite in Parser.Names class (which is a simple registry of known schemas), and not directly in Parser class, or, even better, in a new Interface as CustomSerializer as
interface CustomSerializer {
Schema customRead(Function<String, JsonNode> object);
boolean customWrite(Schema schema, JsonGenerator gen) throws IOException;
}
...
public Parser(CustomSerializer cs) {
this.cs = cs;
}
Github seems to mess up this PR... will create a new one