Beam SQL Extension raised an error when the input row contained iterable fields
Please add a meaningful description for your change here
Related to : https://github.com/apache/beam/issues/31118
Upon utilizing the Beam SQL within our pipeline, we encountered a noteworthy exception. It appears that when the input row encompasses fields of iterable types, the process fails irrespective of whether the iterable fields are included in the SQL filter condition or not. This issue stems from the fact that the Beam SQL extension attempts to construct an output row schema based on the input schema, and unfortunately, it currently lacks support for iterable types.
Consider the following example:
Given an inputRow in the schema:
field1: String
field2: Integer
field3: Array<String>
field4: ITERABLE
And ANY Beam SQL condition such as :
field2 > 1
or Even
1 = 1
The pipeline will invariably fail, yielding the following error: Exception in thread "main" java.lang.UnsupportedOperationException: Unable to get ITERABLE at org.apache.beam.sdk.extensions.sql.impl.rel.BeamCalcRel$InputGetterImpl.getBeamField(BeamCalcRel.java:603).
(My first contribution to beam , kindly advise how to test lol)
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
- [x] Mention the appropriate issue in your description (for example:
addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, commentfixes #<ISSUE NUMBER>instead. - [ ] Update
CHANGES.mdwith noteworthy changes. - [ ] If this contribution is large, please file an Apache Individual Contributor License Agreement.
See the Contributor Guide for more tips on how to make review process smoother.
To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.
Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment assign set of reviewers
thanks for the fix. While this sounds a valid fix, would you mind sharing the full stack trace. My understanding here is that there are two issues
(1) Beam SQL does not filter out fields not used
(2) Iterable isn't supported by Beam SQL
A full stack will be helpful to investigate (1), and possible optimization
The Iterable field type, introduced in #10003 was meant to be different than ARRAY. However the fix here treats it the same as ARRAY. It may have performance implications, and not work for large iterables? Maybe add a comment here or a TODO.
until it's optimized for Iterable, one can just write
case ARRAY:
case ITERABLE:
return ....
so no need duplicate the line.
Also, there are switch (fieldType.getTypeName()) branches in several places in BeamCalRel, could it be all fixed for consistency?
@Abacn Sorry, I missed your comments. Was busy with the work and I completely forgot about this issue and did not follow up. I just saw that https://github.com/apache/beam/pull/31588/files fixed this issue.
I will close this PR."