dgs-codegen
dgs-codegen copied to clipboard
Add Boolean fields to Data classes for supporting sparse update
Description
Issue: #609
Add Boolean fields to all data classes. Here is a gist of changes:
- Generate boolean fields for Java data classes by default, removed additional flag exposed for CodeGen config
- Add a boolean for each field in data class called
is<Field>Defined
- Getter for boolean called
getIs<Field>Defined
- Setter for each field in the class explicitly sets value of
is<Field>Defined
- Update boolean field and getter similarly for Builder class
- Update Builder.build() function to use setter functions for each field from data class
- Please note since this feature is enabled as default, most tests needed an update to account for additional fields added to data classes.
Example of data classes created:
Thanks!
Validation
Sample schema and codegen with bitset: https://github.com/krutikavk/dgs-codegen-run
Hi @srinivasankavitha Can you take a look at the PR
Hi @srinivasankavitha Can you take a look at the PR Thanks for the PR! Will do later this week.
Thanks for the PR! Will do later this week.
Thanks @srinivasankavitha. I am also working on updating Kotlin codegen.
What is the difference between generators/kotlin
and generators/kotlin2
--should I be updating both? The flag that exposes Kotlin2 data type generation generateKotlinNullableClasses
is not exposed as a flag for codegen CLI
yeah that flag is specific to kotlin2. Yes, it would be great of you can add to both since we try to maintain feature parity as much as possible.
yeah that flag is specific to kotlin2. Yes, it would be great of you can add to both since we try to maintain feature parity as much as possible.
Sounds great, thank you 👍
Also, you can run ./gradlew formatKotlin
to fix linting errors that are failing the build.
Linting and some unit tests fixed, please check @srinivasankavitha
@srinivasankavitha I see linting fixed many files across the repo--can you please check if this change is as required by the build? Since the last failure 7a1d567, added one more linter fix that had slipped by.
@krutikavk - I'm reviewing your PR and have a few questions:
- Can you provide an example of how this newly generated class will be used? Does a user call isFieldSet() explicitly? It is marked private so I'm unclear how as a user I will use this feature.
- Did you make any changes to the tests under integTests? If not, could you remove those files, since they seem to be all formatting updates and is adding a lot of noise to the PR
@krutikavk - I'm reviewing your PR and have a few questions:
- Can you provide an example of how this newly generated class will be used? Does a user call isFieldSet() explicitly? It is marked private so I'm unclear how as a user I will use this feature.
- Did you make any changes to the tests under integTests? If not, could you remove those files, since they seem to be all formatting updates and is adding a lot of noise to the PR
- Issue #609 : The features ensures proper handling of optional fields in GraphQL Mutations. With additional fields, codegen deserialization can properly mark the fields that are not provided in a mutation so that the DGS/subgraph implementation can handle sparse updates. The feature ensures that optional fields are not overwritten with default values or null, and hence provides a more effective and desirable way of handling mutations in GraphQL.
Since object mappers in DGS framework rely on reflection, methods isSet()
and setField()
can stay private.
We will follow up with a more comprehensive solution working with DGS runtime as well. As a follow-up, mappers for Java and Kotlin objects will be updated to use Bitset-related fields to handle sparse updates correctly.
A proof of concept for this is available here, highlighting the updates we want to make: https://github.com/ramapalani/sparseupdate?tab=readme-ov-file#changes-to-model-classes
- I have removed all the additional changes that formatter made to
integTests
.
Please advise in case you have any alternative suggestions against this approach.
@krutikavk - I'm reviewing your PR and have a few questions:
- Can you provide an example of how this newly generated class will be used? Does a user call isFieldSet() explicitly? It is marked private so I'm unclear how as a user I will use this feature.
- Did you make any changes to the tests under integTests? If not, could you remove those files, since they seem to be all formatting updates and is adding a lot of noise to the PR
- Issue Codegen to support sparse updates #609 : The features ensures proper handling of optional fields in GraphQL Mutations. With additional fields, codegen deserialization can properly mark the fields that are not provided in a mutation so that the DGS/subgraph implementation can handle sparse updates. The feature ensures that optional fields are not overwritten with default values or null, and hence provides a more effective and desirable way of handling mutations in GraphQL.
Since object mappers in DGS framework rely on reflection, methods
isSet()
andsetField()
can stay private.We will follow up with a more comprehensive solution working with DGS runtime as well. As a follow-up, mappers for Java and Kotlin objects will be updated to use Bitset-related fields to handle sparse updates correctly.
A proof of concept for this is available here, highlighting the updates we want to make: https://github.com/ramapalani/sparseupdate?tab=readme-ov-file#changes-to-model-classes
- I have removed all the additional changes that formatter made to
integTests
.Please advise in case you have any alternative suggestions against this approach.
Hi @krutikavk - thanks for the follow up. Regarding this change:
Since object mappers in DGS framework rely on reflection, methods isSet() and setField() can stay private.
We will follow up with a more comprehensive solution working with DGS runtime as well. As a follow-up, mappers for Java and Kotlin objects will be updated to use Bitset-related fields to handle sparse updates correctly.
A proof of concept for this is available here, highlighting the updates we want to make: https://github.com/ramapalani/sparseupdate?tab=readme-ov-file#changes-to-model-classes
I have concerns around the proposed update to the object mappers in the DGS framework. I think have the isSet
method public, would give users more flexibility to check the input values and whether they are set. This, to me, is a very useful feature indeed. However, updating the behavior of the object mappers is changing the behavior of deserialization under the hood, and not all use cases require that behavior. Exactly what functionality would go into the framework is also unclear to me. I did take a look at the linked example, so I see the implementation details of the advancedMapper
, but how that will be exposed and used in the framework is unclear.
I would still prefer having the isSet
be publicly exposed so it provides value regardless of whether the object mapper uses this or not.
As you mentioned, we should also have a separate discussion regarding the updates you intend to make for the object mapper.
@krutikavk - I'm reviewing your PR and have a few questions:
- Can you provide an example of how this newly generated class will be used? Does a user call isFieldSet() explicitly? It is marked private so I'm unclear how as a user I will use this feature.
- Did you make any changes to the tests under integTests? If not, could you remove those files, since they seem to be all formatting updates and is adding a lot of noise to the PR
- Issue Codegen to support sparse updates #609 : The features ensures proper handling of optional fields in GraphQL Mutations. With additional fields, codegen deserialization can properly mark the fields that are not provided in a mutation so that the DGS/subgraph implementation can handle sparse updates. The feature ensures that optional fields are not overwritten with default values or null, and hence provides a more effective and desirable way of handling mutations in GraphQL.
Since object mappers in DGS framework rely on reflection, methods
isSet()
andsetField()
can stay private. We will follow up with a more comprehensive solution working with DGS runtime as well. As a follow-up, mappers for Java and Kotlin objects will be updated to use Bitset-related fields to handle sparse updates correctly. A proof of concept for this is available here, highlighting the updates we want to make: https://github.com/ramapalani/sparseupdate?tab=readme-ov-file#changes-to-model-classes
- I have removed all the additional changes that formatter made to
integTests
.Please advise in case you have any alternative suggestions against this approach.
Hi @krutikavk - thanks for the follow up. Regarding this change:
Since object mappers in DGS framework rely on reflection, methods isSet() and setField() can stay private. We will follow up with a more comprehensive solution working with DGS runtime as well. As a follow-up, mappers for Java and Kotlin objects will be updated to use Bitset-related fields to handle sparse updates correctly. A proof of concept for this is available here, highlighting the updates we want to make: https://github.com/ramapalani/sparseupdate?tab=readme-ov-file#changes-to-model-classes
I have concerns around the proposed update to the object mappers in the DGS framework. I think have the
isSet
method public, would give users more flexibility to check the input values and whether they are set. This, to me, is a very useful feature indeed. However, updating the behavior of the object mappers is changing the behavior of deserialization under the hood, and not all use cases require that behavior. Exactly what functionality would go into the framework is also unclear to me. I did take a look at the linked example, so I see the implementation details of theadvancedMapper
, but how that will be exposed and used in the framework is unclear.I would still prefer having the
isSet
be publicly exposed so it provides value regardless of whether the object mapper uses this or not. As you mentioned, we should also have a separate discussion regarding the updates you intend to make for the object mapper.
Thanks for the detailed feedback @srinivasankavitha!
updating the behavior of the object mappers is changing the behavior of deserialization under the hood, and not all use cases require that behavior
To elaborate further, mapToJavaObject in DGS Framework will check the presence of bitset fields i.e. sparse update support and use these fields to support sparse update as applicable.
DGS Framework object mapper will be updated like in the POC here to check presence of bitset support for sparse update.
Additionally, since bitset field is transient, it will not affect serialization of the data class. During deserialization, the bitset field will be set to default value (0) for all fields. The behavior will be unchanged if bitset fields are not present.
As advised, I have also updated the functions for accessing bitset fields isSet() and setField() to public and updated unit tests to reflect this.
Since object mappers in DGS framework rely on reflection, methods isSet() and setField() can stay private.
I think this feature should really be usable stand-alone, without any framework changes. E.g. in user code (in a datafetcher) you should be able to check what fields are set and act depending on that in your code. I don't like the idea to requiring reflection to make this work, or the requirement to make need framework changes for a codegen feature.
Codegen and framework have been completely independent, although obviously working nicely together, to the point that even non-DGS users are using codegen. This also ensures decoupling between both. Not every DGS user wants codegen, and I don't we should cross a line where framework "knows" about codegen.
Could we take a step back and discuss how this is intended to work end-to-end?
What it sounds like to me in this proposal is that the DefaultInputObjectMapper
would need to call a private method, which I am not really in favor of. Users can supply any arbitrary class for mapping, I'd prefer not to rely on that implicit API. Furthermore, at least for mapping to Java objects, we do use setters, so I don't think the method is necessary.
I think the BitSet
is overkill / premature optimization, and in practice probably performs worse than boolean fields. Input types are rarely going to have that many fields, so we've talking about a handful of boolean fields vs. an object reference to a BitSet which must be allocated, plus the generated enum. I think a better approach would be to just add boolean fields for each field in the schema type (something like fooDefined
for a schema field named foo
), and associated getters (isFooDefined
for a schema field named foo
).
Finally, there's an issue of how this will work with defaults. As far as I can tell, it will only function for input types with nullable fields; fields with default values will have the defaults populated by graphql-java before the DGS mapper is even invoked, so there's no way for us to know whether the default was supplied explicitly by the caller or not. This would only really work for nullable fields that aren't explicitly sent in the request.
Example schema:
input InputType {
foo: String
bar: String!
baz: String! = "default"
blah: String = "default-blah"
}
Example variables:
{"input": {"bar": "bar"}}
The DGS mapper is going to get a map with the following contents:
{"bar": "bar", "baz": "default", "blah": "default-blah"}
So it's too late for the mapper to actually tell that baz
and blah
were not explicitly sent. The only thing we could ascertain is that foo
is missing. The proposed feature can only work for nullable fields without any defaults in that case, so I don't know if it makes sense to even generate boolean fields / getters for schema fields with default values.
@paulbakker, @kilink
Here is a POC on how this would work end to end. https://github.com/ramapalani/sparseupdate/tree/reflected-model-setBitset
This solution is inspired by one of our existing solutions. I'm not married to this solution, but we have to solve this sparse update issue, I'm open to any suggestions. If required I can set up a short Zoom call to go over this.
I'd just mention that you can do this today by just checking the DataFetchingEnvironment
. E.g.,
dfe.getArgument<Map<String, Any?>>("input").containsKey("foo")
would tell you whether a null value was supplied explicitly or not.
As Paul also mentioned, I think we would not be in favor of changing the InputObjectMapper to use reflection to call a special field. I'm also hesitant to add a generated method that requires passing in an enum to check if a field is set. I think a simpler approach would be to simply add new getter methods as I mentioned before, and use boolean fields. At least for the Java mapping path, we already call setters first before attempting to set fields directly, so the generated setters would have a chance to set the flags.
I agree with @kilink and @paulbakker.
Codegen feature on it's own is useful and I like @kilink's simplification to avoid the use of BitSet altogether since the number of fields are really not going to be that many.
For the framework, using the data fetching environment directly to determine whether a field was passed in would be ideal. You could add custom utilities to take care of that in your code without messing with the framework's inputObjectmapper.
Let me summarize my understanding:
- DGS runtime: DefaultInputObjectMapper shouldn't be changed to accommodate this feature, especially calling a private method using reflection
- DGS CodeGen:
- Avoid bitset, instead create a boolean for each field with a getter/setter for this boolean. Say if the field is
foo
, getter would beboolean isFieldFoo()
and setter would bevoid setFieldFoo(boolean)
- Individual setter field in the generated code, should call boolean setter field. Say for the field
foo
, its setter method would be like
- Avoid bitset, instead create a boolean for each field with a getter/setter for this boolean. Say if the field is
public void setFoo(String foo) {
this.foo = foo;
setFieldFoo(true); //This will indicate that field foo is either provided by the user (or defaulted)
}
- DGS runtime when it is mapping the input object (InputObjectMapper) it will continue to call just the
setFoo()
method. And there is no change required. - DataFetcher: Uses @InputArgument to automatically map user input to Java/Kotlin objects. And if they are interested in SparseUpdate, they can call boolean getter method to check if something is set by the user (or defaulted). Say the mapped object
myObject
and the field isfoo
, DataFetcher can callmyObject.isFieldFoo()
to confirm if a field is set or not.
Is my understanding correct? I'm ok with this approach, as it helps the developer implementing the DataFetcher to rely on @InputArgument to map object and still implement sparse update.
If you confirm, @krutikavk can update this PR and send it for your review
DGS runtime: DefaultInputObjectMapper shouldn't be changed to accommodate this feature, especially calling a private method using reflection
DGS CodeGen:
- Avoid bitset, instead create a boolean for each field with a getter/setter for this boolean. Say if the field is
foo
, getter would beboolean isFieldFoo()
and setter would bevoid setFieldFoo(boolean)
- Individual setter field in the generated code, should call boolean setter field. Say for the field
foo
, its setter method would be like
Yes, I think that would be the preferred approach. My only minor quibble there is for the naming of the getter, and the need for a setter for the flag at all. A better name for the getter might be isFooDefined
or isFooExplicitlyDefined
or something like that. And as for the setter for the flag, I don't think it's necessary, couldn't the boolean field be set directly in the setter?
Only other things I'd consider:
- Should the new getter / boolean field only be added for nullable fields, as it doesn't make much sense for fields with defaults or non-nullable fields. But I suppose doesn't do much harm to generate it for all fields and could presumably be used on the client side.
- Is a flag really needed for this? I don't see any reason it couldn't just be the default behavior. One less flag / configuration to support is a positive to me, unless there is some concern that it could interfere with users who don't need this feature somehow (they could just ignore it / not use the getters in that case).
Oh, one other issue, is that mapping for Kotlin types doesn't use setters at all currently. There is a ticket open for that, I think we can fix it in a follow up. I believe I have a branch that adds support for it.
Oh, one other issue, is that mapping for Kotlin types doesn't use setters at all currently. There is a ticket open for that, I think we can fix it in a follow up. I believe I have a branch that adds support for it.
@kilink Is there an ETA for this yet?
Thanks all for weighing in, I ll will update the PR for a fresh review
Hey @kilink @srinivasankavitha @paulbakker
Sorry for the delay, I have updated the PR with the following changes as per discussions:
- Generate boolean fields for Java data classes by default, removed additional flag exposed for CodeGen config
- Add a boolean for each field in data class called
is<Field>Defined
- Getter for boolean called
getIs<Field>Defined
- Setter for each field in the class explicitly sets value of
is<Field>Defined
- Update boolean field and getter similarly for Builder class
- Update Builder.build() function to use setter functions for each field from data class
- Please note since this feature is enabled as default, most tests needed an update to account for additional fields added to data classes.
Example of data classes created:
Also, I have not updated class hash function and toString() methods to reflect additional boolean fields. Would you want to expose boolean fields in these functions as well? Do let me know, I can update this shortly as needed.
Thanks!
@srinivasankavitha
However, updating the behavior of the object mappers is changing the behavior of deserialization under the hood, and not all use cases require that behavior. Exactly what functionality would go into the framework is also unclear to me
Could you elaborate how this feature will change the deserialization behavior? The values will be read from json stream exactly as they are supposed to be read and the deserializer will set them on the target object by reflection calls of the respective mutators. The mutators along with storing field values will also raise flags that the field value was mutated. That will help to distinguish mutated field values from assigned by default, even if they are nulls.
@kilink
What it sounds like to me in this proposal is that the DefaultInputObjectMapper would need to call a private method, which I am not really in favor of. Users can supply any arbitrary class for mapping, I'd prefer not to rely on that implicit API. Furthermore, at least for mapping to Java objects, we do use setters, so I don't think the method is necessary.
There will be absolutely NO changes to ObjectMappers. Any JSON deserializer (not only Jackson, but any other) will work with this out of the box. All changes are encapsulated into the receiving generated POJOs.
I think the BitSet is overkill / premature optimization, and in practice probably performs worse than boolean fields. Input types are rarely going to have that many fields, so we've talking about a handful of boolean fields vs. an object reference to a BitSet which must be allocated, plus the generated enum. I think a better approach would be to just add boolean fields for each field in the schema type (something like fooDefined for a schema field named foo), and associated getters (isFooDefined for a schema field named foo).
https://docs.oracle.com/javase/specs/jvms/se17/html/jvms-2.html#jvms-2.3.4 boolean value takes 1 byte. However, compilers align each value allocation to a machine word boundary, which means that on 64-bit processors booleans occupy 8 bytes. That's a lot of wasted memory.
Additionally, in our data models we are using very heavy coarse objects. Making them smaller would increase api chattiness and impact our user experiences.
Finally, there's an issue of how this will work with defaults. As far as I can tell, it will only function for input types with nullable fields; fields with default values will have the defaults populated by graphql-java before the DGS mapper is even invoked, so there's no way for us to know whether the default was supplied explicitly by the caller or not. This would only really work for nullable fields that aren't explicitly sent in the request.
null
is a special case of default, isn't it?
Defaults will work exactly as null
.
Default values (including nulls) are assigned by the object constructor, which shall not call mutators and assign values directly to the fields. So, the corresponding is<Field>Set
methods will not report that these fields were mutated.
During unmarshalling, deserializer will call mutators to set values from the stream. So after unmarshalling is<Field>Set
methods will report that those fields were mutated.
This is the heart of this approach.
There will be absolutely NO changes to ObjectMappers. Any JSON deserializer (not only Jackson, but any other) will work with this out of the box. All changes are encapsulated into the receiving generated POJOs.
The initial design absolutely involved updating the DefaultInputObjectMapper
to call a special method. There's been discussion since then so the message you're replying to is out of date. There was never discussion about JSON deserialization, the term ObjectMapper
is overloaded here but we were discussing the internal DGS mapper, e.g., DefaultInputObjectMapper
.
https://docs.oracle.com/javase/specs/jvms/se17/html/jvms-2.html#jvms-2.3.4 boolean value takes 1 byte. However, compilers align each value allocation to a machine word boundary, which means that on 64-bit processors booleans occupy 8 bytes. That's a lot of wasted memory.
Additionally, in our data models we are using very heavy coarse objects. Making them smaller would increase api chattiness and impact our user experiences.
You can use a tool like jol to look at the layout. What you're saying I don't think is an issue here, since booleans will get packed together in the object header, so it's not as though they each occupy 8 bytes as far as I can tell. A BitSet will also occupy a pointer as well, so 8 bytes, plus the instance itself, and a BitSet has a reference to a long[]
. Some of the math changes if you're using compressedOops or not but it's hard for me to see a scenario where boolean fields would be less space efficient. We are talking about objects that are typically going to have a handful of fields. There are other considerations here as well, e.g., with respect to CPU performance, the BitSet is going to always be worse here, there are just not enough items to justify needing an array of longs, the added pointer chasing, etc. And finally the main issue is with readability and maintainability, using boolean fields is just simpler to maintain, especially with respect to the original proposal that also generated an inner enum class. I'd want to see real data proving that it is causing a performance issue before resorting to such an optimization.
null is a special case of default, isn't it? Defaults will work exactly as null. Default values (including nulls) are assigned by the object constructor, which shall not call mutators and assign values directly to the fields. So, the corresponding is<Field>Set methods will not report that these fields were mutated. During unmarshalling, deserializer will call mutators to set values from the stream. So after unmarshalling is<Field>Set methods will report that those fields were mutated. This is the heart of this approach.
How would they be assigned by the object constructor? I don't know if we're on the same page if you're referring to deserialization and unmarshalling, that doesn't come into the picture here at all. DGS leverage graphql-java to deserialize the JSON request, and for variables or arguments they get converted to a Map before the DGS even touches them. The only thing that comes into play at that layer is perhaps any custom Coercing implementations registered. DGS then maps the Map<String, ?>
we get from graphql-java to a concrete type based on the parameters for the data fetcher methods, that happens in DefaultInputObjectMapper
today.
For mapping to Java objects, today that works by invoking a no-args constructor and all fields are populated via setters.
The initial design absolutely involved updating the DefaultInputObjectMapper to call a special method. There's been discussion since then so the message you're replying to is out of date. There was never discussion about JSON deserialization, the term ObjectMapper is overloaded here but we were discussing the internal DGS mapper, e.g., DefaultInputObjectMapper.
Yes, I was not exactly correct. GraphQL-Java first deserializes input into a map and then InputObjectMapper converts it to a POJO. Pretty much the same, what ObjectMapper.convertValue is doing. The point is that this solution works in both cases.
How would they be assigned by the object constructor? I don't know if we're on the same page if you're referring to deserialization and unmarshalling, that doesn't come into the picture here at all. DGS leverage graphql-java to deserialize the JSON request, and for variables or arguments they get converted to a Map before the DGS even touches them. The only thing that comes into play at that layer is perhaps any custom Coercing implementations registered. DGS then maps the Map<String, ?> we get from graphql-java to a concrete type based on the parameters for the data fetcher methods, that happens in DefaultInputObjectMapper today.
For mapping to Java objects, today that works by invoking a no-args constructor and all fields are populated via setters.
Thanks for correction, yes, with the above comments the mapping occurs from the map deserialized by GraphQL-Java.
Oh, one other issue, is that mapping for Kotlin types doesn't use setters at all currently. There is a ticket open for that, I think we can fix it in a follow up. I believe I have a branch that adds support for it.
@kilink Is there an ETA for this yet?
There's no ETA as of yet, a lot of focus has been on getting the spring-graphql integration out, now that that is done it will be easier to make progress on other items. I do think there is some nuance here, since the current generated kotlin classes rely on default arguments. I'm fine with proceeding with just supporting this for Java at the moment and expanding support for kotlin generated classes later, although I don't know if that works for your codebase?
ticket open
Oh, one other issue, is that mapping for Kotlin types doesn't use setters at all currently. There is a ticket open for that, I think we can fix it in a follow up. I believe I have a branch that adds support for it.
@kilink Is there an ETA for this yet?
There's no ETA as of yet, a lot of focus has been on getting the spring-graphql integration out, now that that is done it will be easier to make progress on other items. I do think there is some nuance here, since the current generated kotlin classes rely on default arguments. I'm fine with proceeding with just supporting this for Java at the moment and expanding support for kotlin generated classes later, although I don't know if that works for your codebase?
Sounds good @kilink! For Kotlin code generator, if there a specific issue and a solution identified, I'd be open to contribute adding explicit setters. This will especially help us contribute to this feature in entirety. Love to hear your thoughts.
I will also get back shortly with an analysis of memory utilization and any potential performance impact for both potential solutions
Hi @kilink @srinivasankavitha
Here's some updates on the discussion for this solution so far. DO share your thoughts and best way to move forward with the change.
- I worked on analyzing both potential solutions (Bitset and Boolean) to store field presence using JOL.
TLDR; Bitset is more efficient in utilizing heap memory over Boolean. This is evident for classes having a huge number of fields (I compared with classes having 3 fields vs 33 fields). Median application size we are looking at has about 288 fields, this disparity may be significant for applications this size, hence the inclination for a Bitset-based solution.
Please refer to this for a deeper look at Java object and the classes. Do share your thoughts on this.
- Based on previous feedback, acknowledge that this feature should be updated without updating DGS runtime. Similar to current Boolean field presence solution as discussed here, only setters can be updated with the Bitset solution as DGS runtime already uses these. Adding an enum to Bitset solution does add a level of operability where code owners don't have to hard code ordinals for each field in the Bitset.
In case there are additional views for discussions, I'd be happy to facilitate a short meeting for working out the best way forward with this feature.
Once confirmed, I can also update the PR with the best fit solution. Thanks!
cc: @ramapalani @gkesler