beam icon indicating copy to clipboard operation
beam copied to clipboard

make FieldValueTypeInformation creators take a TypeDescriptor parameter

Open tilgalas opened this issue 1 year ago • 11 comments

This is a 2nd PR in the series of PRs (based on the now closed #31648, with the first PR being #31785) whose ultimate goal is to add support for generic classes to schema providers.

FieldValueTypeInformation creators will now accept a TypeDescriptor parameter describing the field's containing class, that will let them infer more accurate type information about that field. For example - consider a MyClass<T> class - now with a TypeDescriptor of MyClass<String> and the getter of type T the type resolver can infer the type of field to be String , see the added test class that shows the new functionality


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • [ ] Mention the appropriate issue in your description (for example: addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment fixes #<ISSUE NUMBER> instead.
  • [ ] Update CHANGES.md with noteworthy changes.
  • [ ] If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels Python tests Java tests Go tests

See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.

tilgalas avatar Aug 05 '24 15:08 tilgalas

Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment assign set of reviewers

github-actions[bot] avatar Aug 05 '24 16:08 github-actions[bot]

Run Java PreCommit

tilgalas avatar Aug 05 '24 17:08 tilgalas

Run Java_GCP_IO_Direct PreCommit

tilgalas avatar Aug 06 '24 11:08 tilgalas

Run Java PreCommit

tilgalas avatar Aug 06 '24 12:08 tilgalas

Run Java_Hadoop_IO_Direct PreCommit

tilgalas avatar Aug 06 '24 14:08 tilgalas

assign set of reviewers

tilgalas avatar Aug 06 '24 16:08 tilgalas

Assigning reviewers. If you would like to opt out of this review, comment assign to next reviewer:

R: @m-trieu for label java. R: @damondouglas for label io.

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

The PR bot will only process comments in the main thread (not review comments).

github-actions[bot] avatar Aug 06 '24 16:08 github-actions[bot]

@damondouglas could you please take a look at this one?

damccorm avatar Aug 14 '24 09:08 damccorm

@tilgalas Thank you for working on this! Would you mind to first remove the "nullness" value in the @SuppressWarnings annotation for the involved classes in this PR? Please consider this suggestion as non-blocking but I would like to see if this is possible with your helpful changes. As a second phase, may we also consider removing the "rawtypes"? Again, non-blocking but it's something we would like to strive for to improve the code quality.

sure, with pleasure!

tilgalas avatar Aug 19 '24 10:08 tilgalas

Reminder, please take a look at this pr: @m-trieu @damondouglas

github-actions[bot] avatar Aug 26 '24 12:08 github-actions[bot]

Assigning new set of reviewers because Pr has gone too long without review. If you would like to opt out of this review, comment assign to next reviewer:

R: @Abacn for label java. R: @Abacn for label io.

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

github-actions[bot] avatar Aug 28 '24 12:08 github-actions[bot]

Reminder, please take a look at this pr: @Abacn @Abacn

github-actions[bot] avatar Sep 05 '24 12:09 github-actions[bot]

waiting on author

Abacn avatar Sep 05 '24 15:09 Abacn

added a refactoring commit on top of the main one, which deals with most of the nullness and rawtypes warnings in the classes involved - I'm happy to move it to a PR of its own if the reviewers find the resulting changes too large to safely review

tilgalas avatar Sep 13 '24 13:09 tilgalas

Run Java PreCommit

tilgalas avatar Sep 13 '24 15:09 tilgalas

Reminder, please take a look at this pr: @Abacn @Abacn

github-actions[bot] avatar Sep 21 '24 12:09 github-actions[bot]

Assigning new set of reviewers because Pr has gone too long without review. If you would like to opt out of this review, comment assign to next reviewer:

R: @damondouglas for label java. R: @damondouglas for label io.

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

github-actions[bot] avatar Sep 25 '24 12:09 github-actions[bot]

Reminder, please take a look at this pr: @damondouglas @damondouglas

github-actions[bot] avatar Oct 03 '24 12:10 github-actions[bot]

Assigning new set of reviewers because Pr has gone too long without review. If you would like to opt out of this review, comment assign to next reviewer:

R: @Abacn for label java. R: @chamikaramj for label io.

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

github-actions[bot] avatar Oct 08 '24 12:10 github-actions[bot]

hi reviewers, I'm going to split this PR and put the nullness and rawness refactoring commit into its own PR

tilgalas avatar Oct 14 '24 10:10 tilgalas

since there's a #32757, I'm closing this PR

tilgalas avatar Oct 16 '24 12:10 tilgalas

reopening after discussing offline

tilgalas avatar Oct 22 '24 14:10 tilgalas

Fixes #32081

ahmedabu98 avatar Oct 25 '24 20:10 ahmedabu98

A bunch of tests are failing. PTAL to see if this is due to your PR or just flaky tests.

reuvenlax avatar Oct 28 '24 17:10 reuvenlax

Run Java PreCommit

tilgalas avatar Oct 29 '24 15:10 tilgalas

Run Java PreCommit

tilgalas avatar Oct 30 '24 12:10 tilgalas

Run Java_GCP_IO_Direct PreCommit

tilgalas avatar Oct 30 '24 12:10 tilgalas

lgtm

reuvenlax avatar Oct 30 '24 17:10 reuvenlax

This likely breaks DataflowTemplate unit test: https://github.com/GoogleCloudPlatform/DataflowTemplates/pull/2014/checks?check_run_id=33057418339

I'm trying to understand the scope of breaking change

The stacktrace

Caused by: java.lang.IllegalStateException: getters require withGetterTarget.
	at org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.base.Preconditions.checkState(Preconditions.java:512)
	at org.apache.beam.sdk.values.Row$Builder.withFieldValueGetters(Row.java:843)
	at org.apache.beam.sdk.schemas.GetterBasedSchemaProvider$ToRowWithValueGetters.apply(GetterBasedSchemaProvider.java:135)
	at org.apache.beam.sdk.schemas.GetterBasedSchemaProvider$ToRowWithValueGetters.apply(GetterBasedSchemaProvider.java:114)
	at org.apache.beam.sdk.schemas.SchemaCoder.encode(SchemaCoder.java:121)
	at org.apache.beam.sdk.coders.MapCoder.encode(MapCoder.java:92)
	at org.apache.beam.sdk.coders.MapCoder.encode(MapCoder.java:41)
	at org.apache.beam.sdk.coders.MapCoder.encode(MapCoder.java:93)
	at org.apache.beam.sdk.coders.MapCoder.encode(MapCoder.java:41)
	at org.apache.beam.sdk.coders.KvCoder.encode(KvCoder.java:73)
	at org.apache.beam.sdk.coders.KvCoder.encode(KvCoder.java:37)
	at org.apache.beam.sdk.util.CoderUtils.encodeToSafeStream(CoderUtils.java:86)
	at org.apache.beam.sdk.util.CoderUtils.encodeToByteArray(CoderUtils.java:70)
	at org.apache.beam.sdk.util.CoderUtils.encodeToByteArray(CoderUtils.java:55)
	at org.apache.beam.sdk.util.CoderUtils.clone(CoderUtils.java:168)
	at org.apache.beam.sdk.util.MutationDetectors$CodedValueMutationDetector.<init>(MutationDetectors.java:118)
	at org.apache.beam.sdk.util.MutationDetectors.forValueWithCoder(MutationDetectors.java:49)
	at org.apache.beam.runners.direct.ImmutabilityCheckingBundleFactory$ImmutabilityEnforcingBundle.add(ImmutabilityCheckingBundleFactory.java:115)
	at org.apache.beam.runners.direct.ParDoEvaluator$BundleOutputManager.output(ParDoEvaluator.java:305)

MapCoder.encode -> SchemaCoder.encode ->GetterBasedSchemaProvider.ToRowWithValueGetters passes a null value to withFieldValueGetters somehow, causing IllegalStateException

This PR made substantial change to the related code path. Somehow some encoding worked before now crash with the Exception above

Update: I now suspect it is due to unintended consequence of nullable annotation fix. I checked that where CodedValueMutationDetector.<init>(MutationDetectors.java:118) throws Exception, the element KV Coder trying to encode is sth like

KV{BigQueryTable{project=test-project1, dataset=test-dataset1, tableName=unpartitioned_table,
partitioningColumn=null, partitions=null, lastModificationTime=1731699015913000,
schemaSupplier=com.google.cloud.teleport.v2.utils.SerializableSchemaSupplier@2a2435f9,
dataplexEntityName=unpartitioned_table_entity},
null}, org.apache.beam.sdk.coders.KvCoder

Abacn avatar Nov 15 '24 17:11 Abacn

This whole thread of work is interesting. FWIW the whole point of TypeDescriptor is to carry the generic information. Otherwise we could just use Java's reflective capabilities. Is there a TL;DR document about the limitation that you are improving?

kennknowles avatar Nov 03 '25 14:11 kennknowles