beam
beam copied to clipboard
[Feature Request]: Improve Managed API to avoid redundant `PCollectionRowTuple` calls
What would you like to happen?
The Managed transform API returns PCollectionRowTuples, which requires extra calls when building a pipeline. Specifically, for reads we have to call PCollectionRowTuple.empty(pipeline).apply ... .get("output"), and for writes PCollectionRowTuple.of("input", input).apply(...
It would be nice if the API used ordinary PCollection so user pipeline code is more streamlined.
More context:
https://github.com/GoogleCloudPlatform/java-docs-samples/pull/9339#discussion_r1609244407
https://github.com/GoogleCloudPlatform/java-docs-samples/pull/9339#discussion_r1609246266
Issue Priority
Priority: 3 (nice-to-have improvement)
Issue Components
- [ ] Component: Python SDK
- [X] Component: Java SDK
- [ ] Component: Go SDK
- [ ] Component: Typescript SDK
- [X] Component: IO connector
- [ ] Component: Beam YAML
- [ ] Component: Beam examples
- [ ] Component: Beam playground
- [ ] Component: Beam katas
- [ ] Component: Website
- [ ] Component: Spark Runner
- [ ] Component: Flink Runner
- [ ] Component: Samza Runner
- [ ] Component: Twister2 Runner
- [ ] Component: Hazelcast Jet Runner
- [ ] Component: Google Cloud Dataflow Runner
cc: @ahmedabu98 @damccorm probably we should think about improving the schema-transforms API so that these redundant pieces can be avoided.