Neville Li

Results 45 comments of Neville Li

More specifically one can write nested case class as Parquet file with `parquet-types` and might want to read them as TF `Example` which doesn't support nesting. Alternatively we can ask...

@raunaqmorarka CLA approved & everything passed excepted one due to possibly network issue?

@regadas WDYT? @mdvorsky if you think it's a trivial change, mind submitting a PR and we can discuss there?

WIP here: Not sure if we really need this given `.protobuf.avro` is not a standard format plus we worked around it internally. Will leave this on hold for now.

This is sort of by design. `sc.ParquetAvroFile(path)` doesn't initiate the IO right way, `.map(f)` does, together with `f` as a projection function since the projected Avro records might be incomplete...

Agree that a new `STransform` might be too complex. The goal of this is to allow power users to dynamically produce job graph for certain tricky transforms like join based...

Beam SQL's `BeamSqlTable` interface actually has table statistics notion, so maybe we can leverage that. Then again graph optimizations are probably easier in SQL than scala code.

After discussion IRL. This should be handled in Beam SQL. Too much complexity adding it to the Scala layer. Will leave this open as a reminder. Should be possible since we have a custom read `DoFn` instead of the generic line delimited `TextIO`. I suspect you'll have to add some implicit arguments to propagate the...

This is a streaming job I assume? By timeout, do you mean overwriting the same file URIs and have `DistCache` instances re-downloading them? This might lead to race condition and...