scio
scio copied to clipboard
Support projections in ParquetAvroFileOperations/ParquetAvroSortedBucketIO
ParquetAvroFileOperations always overrides the "projection" option to equal the full reflected schema, so you can't supply a projection for a SpecificRecord class:
https://github.com/spotify/scio/blob/110f79593c67c58a2c2465bf2fb340ff4711003f/scio-smb/src/main/java/org/apache/beam/sdk/extensions/smb/ParquetAvroFileOperations.java#L175-L176
#5083 provides a workaround for this via the Configuration
parameter:
val projection: Schema = ...
val configuration = ParquetConfiguration.empty()
AvroReadSupport.setRequestedProjection(configuration, projection)
val read = ParquetAvroSortedBucketIO
.read(tupleTag, classOf[TestRecord])
.from(...)
.withConfiguration(configuration)
In 0.14 we can add projection
as a Builder method to ParquetAvroSortedBucketIO