Neville Li

Results 45 comments of Neville Li

Found this https://stackoverflow.com/questions/46983318/writing-via-textio-write-with-sessions-windowing-raises-groupbykey-consumption-e I verified that `.saveAsTextFile()` causes the same error. Adding `.withGlobalWindow()` before `saveAs*` fixes the issue. ~Are you sure this only affects 0.7.0? The bug is a year...

Hey how are you reading PubSub with `GenericRecord`? Looks like only `SpecificRecord` is implemented. https://github.com/spotify/scio/blob/master/scio-core/src/main/scala/com/spotify/scio/io/PubsubIO.scala#L85 Also I think it'd depend on the underlying Beam `PubsubIO` behavior, which I don't think...

So Algebird `Semigroup` & `Monoid` has implicit conversion from Cats equivalents, which means we can change `sum/fold` context bound to cats without breaking anything. That leaves `Aggregator`, though. We can...

Seems to be a design choice in `InternalParquetRecordReader` so hard to solve unless we fork that logic. https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/InternalParquetRecordReader.java#L190

I'm thinking whether we can register types that are not supposed to go through ser/de, e.g. `ScioContext`, `SCollection`, `SideOutputContext`, etc. with a `DoNotEncode` coder that throws compile time or runtime...

Related to https://github.com/spotify/scio/issues/3201

> The allowable limit for the total size of the `BoundedSource` objects generated by your custom source's `splitIntoBundles()` operation is 20 MB. You can work around this limitation by modifying...

@psobot thoughts? I like the idea. Naming of `SparkeyIO` etc. can be tweaked a bit but overall this could be useful?

So what should we do next? It is a bit hacky but also solves a practical need. Also it's in `scio-extra` so I think we can live with it as...