beam
beam copied to clipboard
Apache Beam is a unified programming model for Batch and Streaming data processing.
The PostCommit Java ValidatesRunner Flink is failing over 50% of the time Please visit https://github.com/apache/beam/actions/workflows/beam_PostCommit_Java_ValidatesRunner_Flink.yml?query=is%3Afailure+branch%3Amaster to see the logs.
The LoadTests Go GBK Flink Batch is failing over 50% of the time Please visit https://github.com/apache/beam/actions/workflows/beam_LoadTests_Go_GBK_Flink_Batch.yml?query=is%3Afailure+branch%3Amaster to see the logs.
The TypeScript Tests is failing over 50% of the time Please visit https://github.com/apache/beam/actions/workflows/typescript_tests.yml?query=is%3Afailure+branch%3Amaster to see the logs.
This diff implements a new `parittioners` module which includes a `Top` partitioner. `Top` replicates the `combiners.Top` behavior. This is a WIP. I initially implemented this for my particular case which...
The latter uses `ConverterManager.getInstance()`, which is not thread-safe: https://github.com/JodaOrg/joda-time/blob/main/src/main/java/org/joda/time/convert/ConverterManager.java#L89 We learned of this inside Google because of a report by TSAN. I don't know whether simple changes like this are...
Add direct path code path we will be able to run direct path pipelines by passing in the option is `IsWindmillServiceDirectPathEnabled` - I added some new components (they are just...
Apparently java unit tests don't have timeouts. And as far as I can tell, Junit lets you set timeouts for individual tests and test classes, but provides no easy way...
This PR addresses #30423 with the Web API connector interfaces. A future PR will create the actual transform that processes Web API requests into a response PCollection. ------------------------ Thank you...
Add a ProcessingTime queue to Prism's element manager, that can be appropriately controlled by the TestStream notion of time. Basic design is that the queue doesn't contain the elements, but...
### What needs to happen? There are reports of BigQueryIO DIRECT_READ running out of quota in large Dataflow batch pipelines. Basically, there are both a per-project and a per-region quota...