Pablo comments

Results 90 comments of


                                            Pablo

implement basic watermarking

TODO @pabloem - what is the produced watermark and why do stages differentiate

implement basic watermarking

once we start parallelizing the execution of bundles, our watermarking will need to change (we need to wait for all parallel tasks of a given stage before advancing its watermark...

A performance test for the Ray Beam portable runner

that would be great! We can track performance using this action: https://github.com/benchmark-action/github-action-benchmark I don't think it needs to be very big. I think if it processes 1GB running locally with...

A performance test for the Ray Beam portable runner

We have a few microbenchmarks in Beam that you could use as inspiration, but I don't think they're big enough to test our runner and optimize over time: - https://github.com/apache/beam/blob/master/sdks/python/apache_beam/tools/fn_api_runner_microbenchmark.py...

Prototype expansion of SQL transforms for single-node execution

Ray Java resources: - https://docs.ray.io/en/latest/ray-core/configure.html#java-applications - https://docs.ray.io/en/latest/ray-core/cross-language.html#cross-language - https://docs.ray.io/en/latest/ray-core/package-ref.html fyi @iasoon @valiantljk this issue is more complex than the other stuff you've tried, but it should help move one of...

Prototype expansion of SQL transforms for single-node execution

yes, we would have to add support for expanding java PTransforms. I think we can limit the scope of this quite a bit while still delivering SQL execution.

Parallelization: 'Reshuffle' data shared between stages

@wilsonwang371 this is the task where we parallelize the processing of data : )

Design a path for work items to report progress

@valiantljk this is something that you could consider taking a stab with? : ) you'd have to add some smart code so that ongoing `ray_execute_bundle` tasks can report progress -...

Design a path for work items to report progress

lmk if that helps

Add watermark-based scheduling to the Ray Runner

Useful literature: https://s.apache.org/beam-fn-api