scio icon indicating copy to clipboard operation
scio copied to clipboard

A Scala API for Apache Beam and Google Cloud Dataflow.

Results 213 scio issues
Sort by recently updated
recently updated
newest added

Scio [points to log4j 1.2.7](https://github.com/spotify/scio/blob/e5040254ee5e29d3c58e34937074b90a88a29e8b/build.sbt#L1323). I guess it is used in a few modules, including `scio-parquet` (apparently as a transitive dependency of `hadoop`. Note now there is [reload4j](https://reload4j.qos.ch/) which "aims...

Unless `"avro.java.string": "String"` property is set, String fields in GenericRecords will get decoded as avro Utf8 objects. This causes `org.apache.avro.util.Utf8`->`java.lang.String` casting issues when the field is used as an SMB...

At the top left of each code snippet, there are two links, "copy" and "source" that render as "copysource". "copy" is redundant to the copy icon at top right. "source"...

documentation

I wanted to try the `RateLimiterDoFn` class I found in the scio source code after googling around to solve a rate limiting use case (https://github.com/spotify/scio/blob/main/scio-core/src/main/java/com/spotify/scio/transforms/RateLimiterDoFn.java). I google "scio maven" because...

When we are using side inputs with streaming pipelines, most of the use cases require this side inputs to be refreshed(re-calculate) over time. Scio doesn't have a nicer way to...

streaming

I have a use case where we need to produce multiple outputs using a sideInput however it looks like this is not supported in https://github.com/spotify/scio/blob/main/scio-core/src/main/scala/com/spotify/scio/values/SCollectionWithSideInput.scala. I was wondering if there...

enhancement

Running in to the following error when trying to read a pretty large data partition from GCS using `parquetAvroFile`: ``` Error message from worker: java.lang.IllegalArgumentException: Total size of the BoundedSource...

bug
parquet

Is it possible currently or are there any plans to support applyTransform for native Java Beam SDK PTransform[PCollectionTuple, PCollectionTuple]?

enhancement

currently AsyncLookupDoFn will [block indefinitely](https://github.com/spotify/scio/blob/main/scio-core/src/main/java/com/spotify/scio/transforms/FutureHandlers.java#L38) waiting for Futures to complete in the `@FinishBundle` step. Should we support a user-specified timeout (`long timeout, TimeUnit timeUnit`) after which the Future will be...

Currently, our built-in file read apis (`sc.avroFile`, `sc.protobufFile`, `sc.textFile`, ...) take a single String `path` either matching a specific file, or a filepattern. The approach most users seem to take...

enhancement