scio
scio copied to clipboard
A Scala API for Apache Beam and Google Cloud Dataflow.
Hello. I believe there is a bug in [SchemaMaterializer](https://github.com/spotify/scio/blob/main/scio-core/src/main/scala/com/spotify/scio/schemas/SchemaMaterializer.scala#L157) if type of key or value isn't a primitive. getCollectionElementType returns field type only for lists for map getMapKeyType and getMapValueType...
With no current BOM support in sbt, the recommended way to ship a set of dependencies (see sbt/sbt#4531) is via a plugin. This wouldn't have quite the same semantics as...
Right now, we have one util method to update number of nodes in Bigtable, the IT test is ignored, soon we will have two. We need a better way to...
We have a feature request that metric mutation e.g. `counter.inc()` should fail with a meaningful message when called outside of a pipeline (a lambda). This is probably a beam issue....
Need to look at the implications. ```scala def timestampBy(f: T => Instant, allowedTimestampSkew: Duration = Duration.ZERO): SCollection[T] = this.applyTransform(WithTimestamps.of(Functions.serializableFn(f)) .withAllowedTimestampSkew(allowedTimestampSkew)) ``` https://issues.apache.org/jira/browse/BEAM-644
Right now we have `SCollection#sample` which uses a Beam transform and is implemented as a global combine operation. It'll be nice to have a sample into side output variation that...
Instead of encode `method@{Source.scala:123}` in transform names.
Which might solve permission issues when compiling code that uses BigQuery macros. https://cloud.google.com/bigquery/docs/share-access-views
When is used a bigquery input method (e.g. toTable), it would be nice to have the possibility to set the template compatibility to typeRead to be able to use the...
When the total size of output data remains small, we can make their reads faster if they're in Memcached. Thoughts are that we could implement Scio's `AsyncDoLookupFn` to use the...