beam
beam copied to clipboard
Apache Beam is a unified programming model for Batch and Streaming data processing.
### What happened? Hello, I am facing a strange issue where my parquet file sizes are exploding. My Env: * Beam SDK: 2.35.0 * Parquet-mr: 1.12.0 (build db75a6815f2ba1d1ee89d1a90aeb296f1f3a8f20) * Execution...
### What happened? I was running into [this issue](https://github.com/apache/beam/pull/22620#issuecomment-1208303417) when adding support for VR tests for Spark in streaming mode (https://github.com/apache/beam/pull/22620). It looks like the runner only supports a global...
### What needs to happen? https://github.com/apache/beam/pull/16947 added Schema support for AWS models. With that `Coder`s for AWS IOs became (mostly) deprecated (see related changes). Respective code should be removed after...
Fixes issue https://github.com/apache/beam/issues/22146 (also reported on [StackOverflow](https://stackoverflow.com/questions/73765145/apache-beam-illegalstateexception-value-only-available-at-runtime-after-upgra)). [ValueProvider](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/options/ValueProvider.html) can be used to pass runtime parameters not available during pipeline creation (which is heavily used by [Dataflow Templates](https://cloud.google.com/dataflow/docs/concepts/dataflow-templates)). In case a...
Add Python 3.10 infrastructure and test suites Fixes: https://github.com/apache/beam/issues/21971 Fixes: https://github.com/apache/beam/issues/21458 FIxes: https://github.com/apache/beam/issues/21671 ------------------------ Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and...
When looking at thread stacks for a pipeline reading pubsub and writing to BQ using the WriteApi there were 3600 Gax- prefixed threads. The # of threads an possibly performance...
### What happened? Python Kafka with_metadata option [1] uses a transform in Java side [2] that is not invoked by regular Python KafkaIO tests. So we need a separate test...
This adds enough annotation and checks to re-enable nullness checking in JdbcIO. It does not do any significant refactor that might make it less error-prone. ------------------------ Thank you for your...
This patch adds a thread pool in the DoFnRunner of Samza runner to execute DoFn.ProcessElement in parallel within a Samza task. This allows greater parallelism beyond a single task/partition. Users...
As part of the migration of Precommit and Postcommit Jobs from Jenkins to GA in self-hosted runners, this PR contains: - Migrated and sharded workflow [job-postcommit-python-ml.yml](https://github.com/benWize/beam/blob/9bd7aa0d3e40a6edde76c1b20ced683e1c521885/.github/workflows/job-postcommit-python-ml.yml) The migrated workflow was...