beam icon indicating copy to clipboard operation
beam copied to clipboard

[Task]: Improve how to handle the Dataflow-specific option `impersonateServiceAccount` for Beam Java

Open liferoad opened this issue 1 year ago • 1 comments

What needs to happen?

impersonateServiceAccount should be kept when submitting Dataflow jobs but should be removed when creating Dataflow workers per the design. To fix this, #30283 put a simple solution to remove the impersonateServiceAccount key from the JSON pipeline options. This introduces some Dataflow-specific concepts, which could be improved by moving it to the Dataflow-specific module. See more details in this comment.

Open this issue to track this potential task to improve how to handle Dataflow-specific options in the future.

Note for Beam Python, we remove this option from the internal Dataflow apiclient module

Issue Priority

Priority: 3 (nice-to-have improvement)

Issue Components

  • [ ] Component: Python SDK
  • [X] Component: Java SDK
  • [ ] Component: Go SDK
  • [ ] Component: Typescript SDK
  • [ ] Component: IO connector
  • [ ] Component: Beam YAML
  • [ ] Component: Beam examples
  • [ ] Component: Beam playground
  • [ ] Component: Beam katas
  • [ ] Component: Website
  • [ ] Component: Spark Runner
  • [ ] Component: Flink Runner
  • [ ] Component: Samza Runner
  • [ ] Component: Twister2 Runner
  • [ ] Component: Hazelcast Jet Runner
  • [X] Component: Google Cloud Dataflow Runner

liferoad avatar Feb 13 '24 16:02 liferoad

For this particular option, the dataflow service (the UW) should be the place where you remove the option.

The Python SDK is a real mess when it comes to isolating non-GCP and GCP things. It is not a good place to use as an example.

kennknowles avatar Feb 13 '24 18:02 kennknowles