beam icon indicating copy to clipboard operation
beam copied to clipboard

[Feature Request]: Improve Managed API to avoid redundant `PCollectionRowTuple` calls

Open VeronicaWasson opened this issue 1 year ago • 1 comments

What would you like to happen?

The Managed transform API returns PCollectionRowTuples, which requires extra calls when building a pipeline. Specifically, for reads we have to call PCollectionRowTuple.empty(pipeline).apply ... .get("output"), and for writes PCollectionRowTuple.of("input", input).apply(...

It would be nice if the API used ordinary PCollection so user pipeline code is more streamlined.

More context:

https://github.com/GoogleCloudPlatform/java-docs-samples/pull/9339#discussion_r1609244407

https://github.com/GoogleCloudPlatform/java-docs-samples/pull/9339#discussion_r1609246266

Issue Priority

Priority: 3 (nice-to-have improvement)

Issue Components

  • [ ] Component: Python SDK
  • [X] Component: Java SDK
  • [ ] Component: Go SDK
  • [ ] Component: Typescript SDK
  • [X] Component: IO connector
  • [ ] Component: Beam YAML
  • [ ] Component: Beam examples
  • [ ] Component: Beam playground
  • [ ] Component: Beam katas
  • [ ] Component: Website
  • [ ] Component: Spark Runner
  • [ ] Component: Flink Runner
  • [ ] Component: Samza Runner
  • [ ] Component: Twister2 Runner
  • [ ] Component: Hazelcast Jet Runner
  • [ ] Component: Google Cloud Dataflow Runner

VeronicaWasson avatar May 22 '24 17:05 VeronicaWasson

cc: @ahmedabu98 @damccorm probably we should think about improving the schema-transforms API so that these redundant pieces can be avoided.

chamikaramj avatar May 23 '24 17:05 chamikaramj