make FieldValueTypeInformation creators take a TypeDescriptor parameter
This is a 2nd PR in the series of PRs (based on the now closed #31648, with the first PR being #31785) whose ultimate goal is to add support for generic classes to schema providers.
FieldValueTypeInformation creators will now accept a TypeDescriptor parameter describing the field's containing class, that will let them infer more accurate type information about that field. For example - consider a MyClass<T> class - now with a TypeDescriptor of MyClass<String> and the getter of type T the type resolver can infer the type of field to be String , see the added test class that shows the new functionality
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
- [ ] Mention the appropriate issue in your description (for example:
addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, commentfixes #<ISSUE NUMBER>instead. - [ ] Update
CHANGES.mdwith noteworthy changes. - [ ] If this contribution is large, please file an Apache Individual Contributor License Agreement.
See the Contributor Guide for more tips on how to make review process smoother.
To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.
Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment assign set of reviewers
Run Java PreCommit
Run Java_GCP_IO_Direct PreCommit
Run Java PreCommit
Run Java_Hadoop_IO_Direct PreCommit
assign set of reviewers
Assigning reviewers. If you would like to opt out of this review, comment assign to next reviewer:
R: @m-trieu for label java. R: @damondouglas for label io.
Available commands:
stop reviewer notifications- opt out of the automated review toolingremind me after tests pass- tag the comment author after tests passwaiting on author- shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)
The PR bot will only process comments in the main thread (not review comments).
@damondouglas could you please take a look at this one?
@tilgalas Thank you for working on this! Would you mind to first remove the
"nullness"value in the@SuppressWarningsannotation for the involved classes in this PR? Please consider this suggestion as non-blocking but I would like to see if this is possible with your helpful changes. As a second phase, may we also consider removing the"rawtypes"? Again, non-blocking but it's something we would like to strive for to improve the code quality.
sure, with pleasure!
Reminder, please take a look at this pr: @m-trieu @damondouglas
Assigning new set of reviewers because Pr has gone too long without review. If you would like to opt out of this review, comment assign to next reviewer:
R: @Abacn for label java. R: @Abacn for label io.
Available commands:
stop reviewer notifications- opt out of the automated review toolingremind me after tests pass- tag the comment author after tests passwaiting on author- shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)
Reminder, please take a look at this pr: @Abacn @Abacn
waiting on author
added a refactoring commit on top of the main one, which deals with most of the nullness and rawtypes warnings in the classes involved - I'm happy to move it to a PR of its own if the reviewers find the resulting changes too large to safely review
Run Java PreCommit
Reminder, please take a look at this pr: @Abacn @Abacn
Assigning new set of reviewers because Pr has gone too long without review. If you would like to opt out of this review, comment assign to next reviewer:
R: @damondouglas for label java. R: @damondouglas for label io.
Available commands:
stop reviewer notifications- opt out of the automated review toolingremind me after tests pass- tag the comment author after tests passwaiting on author- shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)
Reminder, please take a look at this pr: @damondouglas @damondouglas
Assigning new set of reviewers because Pr has gone too long without review. If you would like to opt out of this review, comment assign to next reviewer:
R: @Abacn for label java. R: @chamikaramj for label io.
Available commands:
stop reviewer notifications- opt out of the automated review toolingremind me after tests pass- tag the comment author after tests passwaiting on author- shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)
hi reviewers, I'm going to split this PR and put the nullness and rawness refactoring commit into its own PR
since there's a #32757, I'm closing this PR
reopening after discussing offline
Fixes #32081
A bunch of tests are failing. PTAL to see if this is due to your PR or just flaky tests.
Run Java PreCommit
Run Java PreCommit
Run Java_GCP_IO_Direct PreCommit
lgtm
This likely breaks DataflowTemplate unit test: https://github.com/GoogleCloudPlatform/DataflowTemplates/pull/2014/checks?check_run_id=33057418339
I'm trying to understand the scope of breaking change
The stacktrace
Caused by: java.lang.IllegalStateException: getters require withGetterTarget.
at org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.base.Preconditions.checkState(Preconditions.java:512)
at org.apache.beam.sdk.values.Row$Builder.withFieldValueGetters(Row.java:843)
at org.apache.beam.sdk.schemas.GetterBasedSchemaProvider$ToRowWithValueGetters.apply(GetterBasedSchemaProvider.java:135)
at org.apache.beam.sdk.schemas.GetterBasedSchemaProvider$ToRowWithValueGetters.apply(GetterBasedSchemaProvider.java:114)
at org.apache.beam.sdk.schemas.SchemaCoder.encode(SchemaCoder.java:121)
at org.apache.beam.sdk.coders.MapCoder.encode(MapCoder.java:92)
at org.apache.beam.sdk.coders.MapCoder.encode(MapCoder.java:41)
at org.apache.beam.sdk.coders.MapCoder.encode(MapCoder.java:93)
at org.apache.beam.sdk.coders.MapCoder.encode(MapCoder.java:41)
at org.apache.beam.sdk.coders.KvCoder.encode(KvCoder.java:73)
at org.apache.beam.sdk.coders.KvCoder.encode(KvCoder.java:37)
at org.apache.beam.sdk.util.CoderUtils.encodeToSafeStream(CoderUtils.java:86)
at org.apache.beam.sdk.util.CoderUtils.encodeToByteArray(CoderUtils.java:70)
at org.apache.beam.sdk.util.CoderUtils.encodeToByteArray(CoderUtils.java:55)
at org.apache.beam.sdk.util.CoderUtils.clone(CoderUtils.java:168)
at org.apache.beam.sdk.util.MutationDetectors$CodedValueMutationDetector.<init>(MutationDetectors.java:118)
at org.apache.beam.sdk.util.MutationDetectors.forValueWithCoder(MutationDetectors.java:49)
at org.apache.beam.runners.direct.ImmutabilityCheckingBundleFactory$ImmutabilityEnforcingBundle.add(ImmutabilityCheckingBundleFactory.java:115)
at org.apache.beam.runners.direct.ParDoEvaluator$BundleOutputManager.output(ParDoEvaluator.java:305)
MapCoder.encode -> SchemaCoder.encode ->GetterBasedSchemaProvider.ToRowWithValueGetters passes a null value to withFieldValueGetters somehow, causing IllegalStateException
This PR made substantial change to the related code path. Somehow some encoding worked before now crash with the Exception above
Update: I now suspect it is due to unintended consequence of nullable annotation fix. I checked that where CodedValueMutationDetector.<init>(MutationDetectors.java:118) throws Exception, the element KV Coder trying to encode is sth like
KV{BigQueryTable{project=test-project1, dataset=test-dataset1, tableName=unpartitioned_table,
partitioningColumn=null, partitions=null, lastModificationTime=1731699015913000,
schemaSupplier=com.google.cloud.teleport.v2.utils.SerializableSchemaSupplier@2a2435f9,
dataplexEntityName=unpartitioned_table_entity},
null}, org.apache.beam.sdk.coders.KvCoder
This whole thread of work is interesting. FWIW the whole point of TypeDescriptor is to carry the generic information. Otherwise we could just use Java's reflective capabilities. Is there a TL;DR document about the limitation that you are improving?