scio icon indicating copy to clipboard operation
scio copied to clipboard

Confusing serialization error in `SCollectionWithSideOuptut#map`

Open nevillelyh opened this issue 4 years ago • 3 comments

    val (sc, args) = ContextAndArgs(cmdlineArgs)
    val tracingOutput = args("tracingOutput")
    val tracingInfo: SideOutput[String] = SideOutput()
    val data = sc.parallelize(List("abcd", "efgh", "ijkl"))
    val (_, sideOutputs) = data
      .withSideOutputs(tracingInfo)
      .map {
        case (s, sideOutputContext) =>
          sideOutputContext.output(tracingInfo, s)
          s // serialization error without this 
      }

https://spotify.github.io/scio/api/com/spotify/scio/values/SideOutputContext.html#outputS:com.spotify.scio.values.SideOutputContext[T] SideOutputContext#output returns itself so it's chainable, but when the user forgets to emit anything in SCollectionWithSideOutput#map, the context gets picked up as output by mistake and causes serialization error.

nevillelyh avatar May 06 '20 17:05 nevillelyh

I'm thinking whether we can register types that are not supposed to go through ser/de, e.g. ScioContext, SCollection, SideOutputContext, etc. with a DoNotEncode coder that throws compile time or runtime error?

nevillelyh avatar May 07 '20 14:05 nevillelyh

Should be possible to make that a compile time issue. Worth a try I guess even though it's a pretty minor issue imo.

jto avatar May 07 '20 15:05 jto

Yeah bang for buck here is 😬

I recently pushed a PR #2987 that might help mitigate this use case? I actually find them very useful as I was doing this a lot.

regadas avatar May 19 '20 19:05 regadas