scio
scio copied to clipboard
A Scala API for Apache Beam and Google Cloud Dataflow.
Can someone please clarify how init function behaves while loading external data on workers using [Distributed cache](https://spotify.github.io/scio/examples/DistCacheExample.scala.html). In case of multi core machines normally Google Dataflow launches 1 thread per...
`Memcached` is widely used for a lot of use cases. It would be nice if we could support it more seamlessly.
Something to execute after `sc.run()` with `ScioContext` & `PipelineResult`, for integration hooks i.e. submitting counters to a dashboard, lineage, or updating Bigtable cluster.
Cont for https://github.com/spotify/scio/issues/3944 ;) Current implementation supplies top level fields to the BQ Storage API even if selected field is a record with only small subset of nested fields (quite...
We are only leveraging Zoltar for model loading. Since we are not leveraging the other features maybe we can live without it. That said, let's see if we can remove...
Current ScioIO scaladoc does not include read/write methods (https://spotify.github.io/scio/api/com/spotify/scio/io/ScioIO.html) Besides that they do not have and documentation, it would be nice at least to include them in Scaladocs as they...