scio icon indicating copy to clipboard operation
scio copied to clipboard

A Scala API for Apache Beam and Google Cloud Dataflow.

Results 213 scio issues
Sort by recently updated
recently updated
newest added

## About this PR 📦 Updates [com.softwaremill.magnolia1_2:magnolia](https://github.com/softwaremill/magnolia) from `1.1.8` to `1.1.9` ## Usage ✅ **Please merge!** I'll automatically update this PR to resolve conflicts as long as you don't change...

WIP of Parquet projection/predicate support in JobTest. Alternately, we could provide custom assertions similar to `CoderAssertions`, i.e.: ```scala val record: AvroType = ??? record withPredicate(FilterApi.and(...)) withProjection(...schema...) should eq(...) ``` but...

A pipeline on Scio 0.14.3 migrated from applying Beam's SortValues transform directly: ```scala data .map { element => KV.of(..., KV.of(..., ...)) } .applyTransform(GroupByKey.create()) .setCoder(...) .applyTransform( SortValues.create( BufferedExternalSorter .options() .withMemoryMB(2047) .withExternalSorterType(ExternalSorter.Options.SorterType.NATIVE)...

enhancement

Adds * `saveAsZstdDictionary` to train a Zstd dictionary on some arbitrary `SCollection[T]`. Estimates the average size of elements `T`, collects `n` elements based on a target training set size, then...

Given the following schema: ```json { "type": "record", "name": "TestRecord", "fields": [ { "name": "nullableDecimal", "type": [ "null", { "type": "bytes", "logicalType": "decimal", "precision": 4, "scale": 2 } ] }...

bug

It would be nice if there were an easy API for users to test Parquet FilterPredicates against SCollections and assert on expected input/output. Would probably require materializing the data to...

enhancement
parquet

When using `saveAsSparkey`, if any shard is > ~2gb then you will get a coder exception and something like ``` Error message from worker: org.apache.beam.sdk.util.UserCodeException: java.lang.OutOfMemoryError: Required array length 2147483639...

parquet-avro supports Schema projections that exclude required fields. However, if a required field is excluded, the Avro record will fail Coder roundtrip during the next PTransform. As a workaround in...

enhancement
parquet

Add an option, either pipeline option or API method param, to instruct Scio to fall back to a non-SMB join if it can't perform an SMB join (due to incompatible...

enhancement
SMB

Use correct avro writer constructor that would load the generated model with conversions. Setup CI to test scio-avro with latest version on main Fix #5331