ray_beam_runner
ray_beam_runner copied to clipboard
Prototype expansion of SQL transforms for single-node execution
One of the main targets for the Ray Beam Runner is to support SQL (and streaming SQL).
Beam's SQL support is implemented in Java. There are two parts for the execution of SQL transforms in Beam:
- Expansion: The way Beam implements expansion of multi-language transforms is by implementing an
ExpansionServiceinterface (sample of the GRPC implementation - this seems way too complicated to be honest)
My idea:
- Implement a class "RayJavaExpansionService" - that receives the expansion request that can be a relatively simple thing. It must contain:
- Schema of the Input PCollection (what are schemas)
- Identifier of the transform to apply (these ideantifiers are provided by SchemaTransformProvider implementations (see a few examples)
- Note: I will implement a Sql one:
SqlSchemaTransformProviderwith id"beam:schematransform:org.apache.beam:sql:v1"this week.
- Note: I will implement a Sql one:
- Parameters for the transform (in this case, just the SQL statement)
The RayJavaExpansionService should then return the schema of the resulting PCollection, as well as the expanded graph of operations in protobuf format (the proto format).
- Java dependencies:
- "org.apache.beam:beam-sdks-java-core"
- "org.apache.beam:beam-sdks-java-extensions-sql"
The expansion is not enough to execute SQL, but it's the first step. The next step is to recognize Java Stages, and execute them in a Java process rather than a Python process (basically, a Java implementation of this code, where we return some kind of JavaWorkerHandler
Ray Java resources:
- https://docs.ray.io/en/latest/ray-core/configure.html#java-applications
- https://docs.ray.io/en/latest/ray-core/cross-language.html#cross-language
- https://docs.ray.io/en/latest/ray-core/package-ref.html
fyi @iasoon @valiantljk this issue is more complex than the other stuff you've tried, but it should help move one of our big features forward. is any of you interested? : )
i don't fully understand this issue. Since you mentioned that this SQL transforms are done in Java. does this mean that we are adding java support for our beam runner?
yes, we would have to add support for expanding java PTransforms. I think we can limit the scope of this quite a bit while still delivering SQL execution.
yes, we would have to add support for expanding java PTransforms. I think we can limit the scope of this quite a bit while still delivering SQL execution.
this sounds cool, if we are also targeting java. I may ask my colleagues to take a look if he is interested to join us.
@Evan2022TT