scio icon indicating copy to clipboard operation
scio copied to clipboard

A Scala API for Apache Beam and Google Cloud Dataflow.

Results 213 scio issues
Sort by recently updated
recently updated
newest added

A spanner read sample can be made with (works fine) ``` sc.spannerQuery(config, "select id1, firstname, lastname from users limit 10", false, false) .map(r => (r.getString(0), r.getString(1), r.getString(2))) .map[Row](Row.tupled) .map(_.toString) .saveAsTextFile(output,...

documentation

Integration test logs indicate that bigquery is not closed properly ```log 2022-12-08 16:39:43.502-0500 error [ManagedChannelOrphanWrapper] *~*~*~ Channel {0} was not shutdown properly!!! ~*~*~* Make sure to call shutdown()/shutdownNow() and wait...

testing

Add suppor for read data from Azure Cosmos DB with Core (SQL) API Refs: #4678, apache/beam#23604, apache/beam#23610

enhancement

Right now Scio overrides `core-site.xml` in a few modules (`scio-parquet`, `scio-smb`). This gets picked up by library consumers, and if they wish, they can specify their own `core-site.xml` file in...

parquet

Since Parquet predicates are not applied in JobTest, it would be nice if we had an assertion helper (a la [CoderAssertions](https://github.com/spotify/scio/blob/main/scio-test/src/main/scala/com/spotify/scio/testing/CoderAssertions.scala) or [BigtableMatchers](https://github.com/spotify/scio/blob/main/scio-test/src/main/scala/com/spotify/scio/testing/BigtableMatchers.scala)) for Parquet `FilterPredicate`s that could be used...

enhancement
testing
parquet

This seems to be the convention in Beam: [AvroIO](https://github.com/apache/beam/blob/7a93e217f5034768ea912b1c07a2e523c2f525ff/sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroIO.java#L750), [TextIO](https://github.com/apache/beam/blob/7a93e217f5034768ea912b1c07a2e523c2f525ff/sdks/java/core/src/main/java/org/apache/beam/sdk/io/TextIO.java#L447)

enhancement
parquet

Currently SMB reads implement `BoundedSource`. Context on SplittableDoFn alternative: https://beam.apache.org/blog/splittable-do-fn/ This might have to wait until Dataflow Runner V2 has gained more widespread adoption, since SDF-based transforms scale better on...

enhancement
blocked
SMB

[BucketMetadataUtil#getSourceMetadata](https://github.com/spotify/scio/blob/acea43ad5255a7b8c11edc34e02f65eb3521c08a/scio-smb/src/main/java/org/apache/beam/sdk/extensions/smb/BucketMetadataUtil.java#L88) gets called when the PTransform is still being expanded (before `Pipeline::run` is invoked). Because of this, we can't use `PTransformOverrides` for any SMB transforms in a `JobTest` context, since...

enhancement
testing
SMB

By default, calling `new Configuration()` loads from any available `core-site.xml` and `core-default.xml`. Scio contains a minimalistic core-site.xml implementation, but `core-default.xml` as loaded from Hadoop is quite long: https://hadoop.apache.org/docs/r2.10.2/hadoop-project-dist/hadoop-common/core-default.xml Since the...

enhancement
parquet