scio
scio copied to clipboard
A Scala API for Apache Beam and Google Cloud Dataflow.
Added to beam per internal user requests in apache/beam#26044, we should have a nicer variant in scio
https://beam.apache.org/releases/javadoc/2.47.0/org/apache/beam/sdk/extensions/python/PythonExternalTransform.html
When using the Scio DSL, I would like to have access to the ResourceHint API to tell Beam / Dataflow what RAM and GPU accelerators I would like to use,...
In theory this will reduce shuffle/streaming data processed cost Needs a dictionary of common symbols to be provided
Add some kind of API where users can get a convenient dead-letter queue, e.g. combined w/ the `safeMap` or `safeFlatMap` methods
from the deprecated ApproximateUnique
Current default is `SimpleJsonpMapper` which is practically useless, perhaps the value used in the example would be more appropriate? ```scala () => { // Use jackson for user json serialization...
Would need to be carefully tested to ensure it wouldn't kill DB instances