Anonymous types must have locally-defined names in order to be stable
Suppose I have the following schema:
$ion_schema_2_0
type::{
name: Foo,
fields: {
a: { fields: { x: int, y: int } },
b: int,
}
}
type::{
name: Bar,
type: list,
element: { fields: { u: float, v: float } }
}
As it is currently implemented, the code generator might generate something like this. (I'm using Kotlin, rather than Java, for conciseness here, but the principle will still apply.)
class Foo(val a: AnonymousType1, val b: Int)
class AnonymousType1(val x: Int, val y: Int)
class Bar(val value: List<AnonymousType2>)
class AnonymousType2(val u: Double, val v: Double)
Now, suppose I add a new field to Foo, like this:
type::{
name: Foo,
fields: {
a: { fields: { x: int, y: int } },
b: int,
c: { fields: { id: string, value: int },
}
}
The code generator has no way of knowing how to keep the generated code stable, and would likely end up generating these classes:
class Foo(val a: AnonymousType1, val b: Int, val c: AnonymousType2)
class AnonymousType1(val x: Int, val y: Int)
class AnonymousType2(val id: String, val value: Int)
class Bar(val value: List<AnonymousType3>)
class AnonymousType3(val u: Double, val v: Double)
In this scenario, AnonymousType2 and Bar have been redefined in a way that will break the code of anyone who is using either one of these classes in any non-trivial way. This cannot be solved by sorting the anonymous types because the only sort attribute that would not result in types being moved/redefined would be the timestamp that the inline type was added to the schema for the first time.
There are a few things that I think we must do in order to fix this problem, but I'm certainly open to hearing other solutions.
- All inline types must be placed inside a namespace corresponding to the parent type. In Java, that would mean e.g.
org.example.AnonymousType1would instead beorg.example.Foo.AnonymousType1. In Rust, it's a little trickier because you could end up with naming clashes where you create a module calledfooto hold children ofFoowhen there's already another module you're generating calledfoo, but this is solvable. - Inline types, if they have a
$code_gen_namefield, should use that name; otherwise...
- Inline types in a
fieldsconstraint should be named after their respective fields. I.e.AnonymousType1would be namedFoo.Ain Java/Kotlin. - Inline types in an
elementconstraint should be namedElement. I.e.AnonymousType2would be namedBar.Elementin Java/Kotlin. - Inline types in
typeorall_ofshould have their constraints flattened into the containing type. - Inline types in
one_ofconstraint have two possible solutions- They should either be required to have a code-gen name (so that we can add a variant-specific annotation) OR
- They should be named
Variant0..VariantN, and everyone_ofconstraint must have its own distinct counter (i.e. not a globally incremented counter). E.g.Car.Variant0,Car.Variant1, etc.
- Inline types in
ordered_elementsshould be namedElement0..ElementN, and everyordered_elementsconstraint must have its own distinct counter.
I might be missing a case somewhere, but I think that mostly covers them all.
As per an offline discussion, all the inline type definitions must have a name defined for the initial implementation. This simplifies later adding support to infer names based on above suggested option 2(e.g. field names).
Also, as part of a 1.0 release of code generation we must add a $code_gen_name field to allow user to specify a name for inline types.
There can be few alternatives to $codegen_name in order to extend it in future to hold more details about the inline type like namespace, different names in different programming languages etc. Writing it out here for consideration,
Option 1: Use an Ion struct
We can use an Ion struct here which can hold more information about the name of inline type, like different names in different programming languages, a namespace to consider for when there is a collision of same inline names possible etc. e.g.
type:: {
name: uses_inline_type,
type: {
type: int,
$isl_codegen: {
java: {
name: FooType,
},
rust: {
name: Foo,
}
...
}
}
}
Option 2: Use annotations
We can use annotations for a compact syntax then the struct in Option 1. e.g.
type:: {
name: uses_inline_type,
type: {
type: int,
$isl_codegen: [
java::name::FooType,
rust::name::Foo,
],
...
}
}
If we do decide to stick with $codegen_name then we can add more constraints like $codegen_name in future and on the implementation side require all/some of them being present based on usage.
e.g.
type:: {
name: uses_inline_type,
type: {
type: int,
$codegen_name_java: FooType,
$codegen_name_rust: Foo,
...
}
}
cc: @zslayton
I did some thinking about this a while ago, here is what I had come up with...
In theory, all code gen properties are one of (click on them for examples):
Language-agnostic fields that just accept the data as the field value without any language specific cases
type::{
name: foo,
$codegen: { ignore: true }
}
Language-specific fields that accept a struct with language-value pairs
type::{
name: special_type,
$codegen: {
native_type: {
rust: "my_crate::types::MyType",
kotlin: "org.example.MyType",
}
}
Language-optional fields that can go either way and accept a language agnostic value and/or language-value pairs.
type::{
name: foo,
$codegen: { name: foo_object }
}
type::{
name: bar,
$codegen: {
name: {
kotlin: bar_class,
rust: bar_struct,
'*': bar_type, // Optional default value for other languages
}
}
}
It’s a backwards compatible change for a codegen config property to go from language-agnostic to language-optional or to go from language-specific to language-optional, but it’s breaking to change a field from language-optional to anything else. Therefore, we should try to make fields language-agnostic or language-specific until we have a compelling case to make it language-optional.
This pattern is especially useful because it's easy for a codegen tool to find the right values. First, find the $codegen field. Then, if you need to access a specific value, such as name, you get the name field. If the value is not a struct, then you've got the value. If it is a struct, you use the language name as the field to get the actual value.
Whatever we do, the configuration hierarchy should be codegen signifier -> property name -> language specifier so that we can avoid repetition for language-agnostic properties, and so that we can make it possible to have language specific values and a fallback value.
We could choose from many different ways to structure the keys. One alternative to the above proposal is a flatter set of keys, like this:
type::{
name: foo,
$codegen_name: { java: FooClass, rust: FooStruct },
$codegen_type_hint: tuple,
}
^^ That was from some of my notes from nearly 18 months ago.
I don't think there's too much difference between different structures, but I would suggest that we don't rely on constructing field names to form something like $codegen_name_java. Having to build strings in order to create the right field names at runtime seems like a bit of an anti-pattern, and if we e.g. want to find all of the java related fields, we need to see all the fields and check for partial matches that include "java". Let's make the IonReader do that parsing work for us.
I would also suggest that we don't use a list of annotated values. I'd rather we be opinionated about a structure for the property values and force values that are similar in some way to be grouped together. I also think that the double annotations are going to be unwieldy to work with and unintuitive for users.
Single annotations could be okay. (Click to expand examples)
I don't love it, but I'm also not opposed to something like either of these:
type::{
name: foo,
$codegen_name: [
java::kotlin::FooClass, // Applies to both Java and Kotlin
rust::FooStruct,
FooType, // Language not specified, so this is the default value for other languages
]
$codegen_type_hint: tuple,
}
type::{
name: foo,
$codegen: {
name: [
java::FooClass,
rust::FooStruct,
],
type_hint: tuple,
}
}
Thanks for the detailed explanation Matthew! I think from your examples, language-specific sounds like the best solution to start with and infact having that specific syntax rather then the flatten one will make it much easier for user to specify everything in one place/one struct, rather then having different properties(e.g. $codegen_name, $codegen_type_hint). I also realized from your examples that this $codegen struct can be used for all inline types as well as top level types(e.g. to use something like type_hint which is useful for both top level and inline types).
So following example from your ideas on language-specific syntax, sounds like the best approach to me:
type::{
name: foo,
$codegen: {
name: {
java: "FooClass",
rust: "FooStruct",
}
type_hint: {
java: class,
rust: struct
}
}