scio icon indicating copy to clipboard operation
scio copied to clipboard

A Scala API for Apache Beam and Google Cloud Dataflow.

Results 213 scio issues
Sort by recently updated
recently updated
newest added

It'd be nice to verify or improve sparkey side input for streaming jobs, to make it easier to handle the following cases: - [ ] trigger reloads when a new...

enhancement
streaming

Looks like [PipelineTestUtils][1] doesn't use a `forTest` scio context. Creating it for test might allow developers to override the behaviour of certain configurations or functions for when they are executed...

question ❓
testing

[TapsTest](https://github.com/spotify/scio/blob/master/scio-test/src/test/scala/com/spotify/scio/io/TapsTest.scala#L88) fails with ` java.util.concurrent.TimeoutException: Future timed out after [10 seconds]` when running on my local machine which has more than enough resources (MBP 16" 2019, 2.4 GHz 8-Core Intel...

bug

Mostly from Parquet. :sob: https://issues.apache.org/jira/browse/PARQUET-1126 There are reports that Hadoop pulls in conflicting logger bindings among other things.

dependencies

After merging https://github.com/spotify/scio/pull/3466, add documentation to explain how to use BigQueryStorage with examples of field selection and row restrictions and how to test them.

good first issue
documentation

``` Operation ongoing in step Read from Master-Index for at least 05m00s without outputting or completing in state finish at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:412) at com.google.common.util.concurrent.AbstractFuture$TrustedFuture.get(AbstractFuture.java:90) at com.foo.BigTableLookupDoFn.waitForFutures(BigTableLookupDoFn.scala:36)...

bug
streaming

When writing a sparkey in one job via e.g. `asSparkey` and reading in another job via e.g. `sparkeySideInput`, testing the second job is clunky. Ideally a `SparkeyIO` class would be...

enhancement
good first issue

POC here: https://github.com/spotify/scio/tree/neville/state It's possible but with the current approach we can only have finite hard coded number of states a single `DoFn` can have. Alternatively we might have to...

enhancement

It would be great when Scio offers a way to configure the BOM support of Kantan. We have to deal with CSVs containing a BOM and want to save the...

enhancement