Neville Li

Results 51 issues of Neville Li

It'd be nice to verify or improve sparkey side input for streaming jobs, to make it easier to handle the following cases: - [ ] trigger reloads when a new...

enhancement
streaming

Mostly from Parquet. :sob: https://issues.apache.org/jira/browse/PARQUET-1126 There are reports that Hadoop pulls in conflicting logger bindings among other things.

dependencies

POC here: https://github.com/spotify/scio/tree/neville/state It's possible but with the current approach we can only have finite hard coded number of states a single `DoFn` can have. Alternatively we might have to...

enhancement

It's been brought up internally, basically [this](https://github.com/spotify/scio/blob/master/scio-avro/src/main/scala/com/spotify/scio/avro/AvroIO.scala#L89) but in the form of a `FileOperations`

enhancement

Here’s an incomplete list of tasks. We can break them down further or create new issues to track as we go. - [x] Make sure Flink runner runs all Beam...

enhancement
help wanted 🗣
P2

```scala val (sc, args) = ContextAndArgs(cmdlineArgs) val tracingOutput = args("tracingOutput") val tracingInfo: SideOutput[String] = SideOutput() val data = sc.parallelize(List("abcd", "efgh", "ijkl")) val (_, sideOutputs) = data .withSideOutputs(tracingInfo) .map { case...

enhancement

This is a broad discussion of job statistics estimation and join strategy inspired by Scalding's [estimation](https://github.com/twitter/scalding/tree/develop/scalding-core/src/main/scala/com/twitter/scalding/estimation) API. cc @anish749 @ClaireMcGinty We want to estimate basic stats, e.g. size, # of...

question ❓

https://github.com/spotify/scio/blob/master/build.sbt#L110 Scio master (since `0.8.0-beta1`) has switched to latest magnolia (`0.12.0`) for Scala 2.13 while pinging my fork `0.10.1-jto` for Scala 2.11 which is no longer supported. Mixing different magnolia...

For performance reasons and when sampling large number of records to file, etc.

enhancement