scio
scio copied to clipboard
Confusing serialization error in `SCollectionWithSideOuptut#map`
val (sc, args) = ContextAndArgs(cmdlineArgs)
val tracingOutput = args("tracingOutput")
val tracingInfo: SideOutput[String] = SideOutput()
val data = sc.parallelize(List("abcd", "efgh", "ijkl"))
val (_, sideOutputs) = data
.withSideOutputs(tracingInfo)
.map {
case (s, sideOutputContext) =>
sideOutputContext.output(tracingInfo, s)
s // serialization error without this
}
https://spotify.github.io/scio/api/com/spotify/scio/values/SideOutputContext.html#outputS:com.spotify.scio.values.SideOutputContext[T]
SideOutputContext#output
returns itself so it's chainable, but when the user forgets to emit anything in SCollectionWithSideOutput#map
, the context gets picked up as output by mistake and causes serialization error.
I'm thinking whether we can register types that are not supposed to go through ser/de, e.g. ScioContext
, SCollection
, SideOutputContext
, etc. with a DoNotEncode
coder that throws compile time or runtime error?
Should be possible to make that a compile time issue. Worth a try I guess even though it's a pretty minor issue imo.
Yeah bang for buck here is 😬
I recently pushed a PR #2987 that might help mitigate this use case? I actually find them very useful as I was doing this a lot.