beam icon indicating copy to clipboard operation
beam copied to clipboard

Apache Beam is a unified programming model for Batch and Streaming data processing.

Results 933 beam issues
Sort by recently updated
recently updated
newest added

### What happened? Hello, I am facing a strange issue where my parquet file sizes are exploding. My Env: * Beam SDK: 2.35.0 * Parquet-mr: 1.12.0 (build db75a6815f2ba1d1ee89d1a90aeb296f1f3a8f20) * Execution...

java
io
parquet
P1
bug
awaiting triage

### What happened? I was running into [this issue](https://github.com/apache/beam/pull/22620#issuecomment-1208303417) when adding support for VR tests for Spark in streaming mode (https://github.com/apache/beam/pull/22620). It looks like the runner only supports a global...

runners
spark
bug
P2
awaiting triage

### What needs to happen? https://github.com/apache/beam/pull/16947 added Schema support for AWS models. With that `Coder`s for AWS IOs became (mostly) deprecated (see related changes). Respective code should be removed after...

java
io
aws
task
P2
schemas

Fixes issue https://github.com/apache/beam/issues/22146 (also reported on [StackOverflow](https://stackoverflow.com/questions/73765145/apache-beam-illegalstateexception-value-only-available-at-runtime-after-upgra)). [ValueProvider](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/options/ValueProvider.html) can be used to pass runtime parameters not available during pipeline creation (which is heavily used by [Dataflow Templates](https://cloud.google.com/dataflow/docs/concepts/dataflow-templates)). In case a...

java
io
gcp
Next Action: Reviewers

Add Python 3.10 infrastructure and test suites Fixes: https://github.com/apache/beam/issues/21971 Fixes: https://github.com/apache/beam/issues/21458 FIxes: https://github.com/apache/beam/issues/21671 ------------------------ Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and...

python
build
website
infra
docker

When looking at thread stacks for a pipeline reading pubsub and writing to BQ using the WriteApi there were 3600 Gax- prefixed threads. The # of threads an possibly performance...

java
io
gcp
improvement
P2

### What happened? Python Kafka with_metadata option [1] uses a transform in Java side [2] that is not invoked by regular Python KafkaIO tests. So we need a separate test...

python
io
kafka
bug
P2
awaiting triage

This adds enough annotation and checks to re-enable nullness checking in JdbcIO. It does not do any significant refactor that might make it less error-prone. ------------------------ Thank you for your...

java
io
jdbc
Next Action: Reviewers

This patch adds a thread pool in the DoFnRunner of Samza runner to execute DoFn.ProcessElement in parallel within a Samza task. This allows greater parallelism beyond a single task/partition. Users...

runners
samza

As part of the migration of Precommit and Postcommit Jobs from Jenkins to GA in self-hosted runners, this PR contains: - Migrated and sharded workflow [job-postcommit-python-ml.yml](https://github.com/benWize/beam/blob/9bd7aa0d3e40a6edde76c1b20ced683e1c521885/.github/workflows/job-postcommit-python-ml.yml) The migrated workflow was...

python
build