scio
scio copied to clipboard
A Scala API for Apache Beam and Google Cloud Dataflow.
Currently scio-avro uses `com.spotify.scio.avro.types.AvroType[T]` that implements "case class GenericRecord" codec using `scala.reflect.macros._` in `com.spotify.scio.avro.types.TypeProvider`. This was implemented in 2017-2019. - Move AvroIO to a newer implementation based on [magnolify-avro](https://github.com/spotify/magnolify/blob/main/docs/avro.md) using...
When using `largeHashIntersectByKey`, we've experienced pipelines hanging when running with dataflow default runnuer. It seems that enabling the [runner v2](https://cloud.google.com/dataflow/docs/runner-v2) unblocks the situation, but we should check why sparkey blocks...
## About this PR 📦 Updates [org.tensorflow:tensorflow-core-api](https://www.tensorflow.org) from `0.4.2` to `0.5.0` ## Usage ✅ **Please merge!** I'll automatically update this PR to resolve conflicts as long as you don't change...
Feature Request, motivated by BaseAsyncDoFn and KV lookups to avoid duplicates (instead of using Redis, BigTable, Hazelcast IMDG, etc) https://github.com/albertols/scio-db/blob/develop/src/main/scala/com/db/myproject/async/http/state/StateBaseAsyncDoFn.java 1. By using State and Timers we could prevent **processElement**...
ParquetAvroFileOperations always overrides the "projection" option to equal the full reflected schema, so you can't supply a projection for a SpecificRecord class: https://github.com/spotify/scio/blob/110f79593c67c58a2c2465bf2fb340ff4711003f/scio-smb/src/main/java/org/apache/beam/sdk/extensions/smb/ParquetAvroFileOperations.java#L175-L176
Missing hbase magnolify in the example
Capture in `waitUntilFinish` and add a shutdown hook
key getter does not work when a CaseMapper is required to translate field names: ``` [info] Cause: java.lang.IllegalStateException: Leaf key field user_name does not exist in record class class org.apache.beam.sdk.extensions.smb.ParquetEndToEndTest$Event...