scio
scio copied to clipboard
A Scala API for Apache Beam and Google Cloud Dataflow.
It'd be nice to verify or improve sparkey side input for streaming jobs, to make it easier to handle the following cases: - [ ] trigger reloads when a new...
Looks like [PipelineTestUtils][1] doesn't use a `forTest` scio context. Creating it for test might allow developers to override the behaviour of certain configurations or functions for when they are executed...
[TapsTest](https://github.com/spotify/scio/blob/master/scio-test/src/test/scala/com/spotify/scio/io/TapsTest.scala#L88) fails with ` java.util.concurrent.TimeoutException: Future timed out after [10 seconds]` when running on my local machine which has more than enough resources (MBP 16" 2019, 2.4 GHz 8-Core Intel...
Mostly from Parquet. :sob: https://issues.apache.org/jira/browse/PARQUET-1126 There are reports that Hadoop pulls in conflicting logger bindings among other things.
After merging https://github.com/spotify/scio/pull/3466, add documentation to explain how to use BigQueryStorage with examples of field selection and row restrictions and how to test them.
``` Operation ongoing in step Read from Master-Index for at least 05m00s without outputting or completing in state finish at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:412) at com.google.common.util.concurrent.AbstractFuture$TrustedFuture.get(AbstractFuture.java:90) at com.foo.BigTableLookupDoFn.waitForFutures(BigTableLookupDoFn.scala:36)...
I'm noticing this while using the `ScalaAsyncLookupDoFn`
When writing a sparkey in one job via e.g. `asSparkey` and reading in another job via e.g. `sparkeySideInput`, testing the second job is clunky. Ideally a `SparkeyIO` class would be...
POC here: https://github.com/spotify/scio/tree/neville/state It's possible but with the current approach we can only have finite hard coded number of states a single `DoFn` can have. Alternatively we might have to...
It would be great when Scio offers a way to configure the BOM support of Kantan. We have to deal with CSVs containing a BOM and want to save the...