beam
beam copied to clipboard
Improve parallelism of closing files in FileIO
Currently close happens in processElement which is per-window. If there are many windows firing this can throttle throughput waiting for IO instead of closing in parallel in finishBundle.
Imported from Jira BEAM-12776. Original Jira may contain additional context. Reported by: scwhittle.
@scwhittle A fix in https://github.com/apache/beam/pull/15354 seems to be causing OOMs for certain customer workflows. The customer specifically bounded the number of parallel closes to 2 by patching the code to work around the issue.
Potential reduction in OOM potential in https://github.com/apache/beam/pull/22645