[Feature Request]: Ability to allocate threads across various ParDo of pipeline
What would you like to happen?
I'm currently using a streaming Apache Beam pipeline on a Dataflow Runner with an attached GPU to perform real-time inference. We ingest Pub/Sub messages that contain the GCS path of a datafile, which we then proceed to download and pre-process before batching and dispatching to the GPU for inference.
The issue is that the earlier preprocessing stages are I/O bound and would benefit from many harness threads, but the inference step would ideally only have one thread to prevent GPU memory oversubscription, despite using only one process.
It would be very useful to be able to configure the maximum number of threads to allocate to the preprocess ParDo in an effort to properly assign threads to stages that need it the most. We'd also then just assign a single thread to the inference ParDo instead of choosing pipeline parameters empirically until they work in the majority of cases.
Issue Priority
Priority: 2
Issue Component
Component: runner-dataflow