flytekit icon indicating copy to clipboard operation
flytekit copied to clipboard

Add remote client that directly utilizes the `flyteidl-rust` Python bindings

Open austin362667 opened this issue 1 year ago • 2 comments

Tracking issue

As a result of the initiative, issues, discussions and serial trials and errors in flyrs branch 1 2 this PR aims to create another FlyteRemote inremote/remote_rs.py which wrapping flyteidl-rust[^1] as its core. Python package, flyteidl-rust, was built directly from Rust. With the helps of works in flyte/flyteidl/src/lib.rs, this PR eliminates the protobuf objects serialization overhead[^2] while calling gRPC services. Unlike the original approach in flyrs, this method optimizes performance and reduces the maintenance cost.

  • Use case ( can work ):
    from flytekit.remote.remote_rs import FlyteRemote
    
    def test_fetch_execute_sync_string_task(register):
        remote = FlyteRemote(Config.auto(config_file=CONFIG), PROJECT, DOMAIN)
        flyte_task = remote.fetch_task(name="basics.echo_string_task.test_echo_string_task", version=VERSION)
        # `wait` to sync execution
        execution = remote.execute(flyte_task, inputs={"s": "hey"}, wait=True)
        assert execution.is_done
        assert execution.outputs["o0"] == "hey"
    

[^1]: Temporarily registered in PyPI test index as test package that will not affect the real PyPI registry in production. [^2]: In previous PRs, it serialize the python types into bytes string and pass into PyO3-exposed Rust functions.

Why are the changes needed?

With flyteidl-rust Python bindings, Flytekit can get rid of not only grpc but also protobuf Python dependencies, ultimately removing the model layer between protobuf and remote entities. This approach leverages Rust structures directly.

For Rust side, developers still can wrap flyte/flytidl as their remote gRPC clients or use its API directly.

For Python, developers can build another remote client by wrapping flyteidl-rust as its core.

What changes were proposed in this pull request?

To this end, we have to replace all types import from model files to use flyteidl-rust package instead. Including,

  1. Modify remote entities to inherit from flyteidl-rust classes instead of model files.

  2. Refactor type transoformers in core/type_engine.py and get_serializable() in translator.py to adapt with flyteidl-rust classes.

    • get_serializable() in tools/translator.py
      • [ ] ReferenceEntity
      • [X] PythonTask
      • [ ] WorkflowBase
      • [X] Node
      • [ ] LaunchPlan
      • [ ] BranchNode
      • [ ] FlyteTask
    • SimpleTransformer in core/type_engine.py
      • [X] int
      • [X] float
      • [X] bool
      • [X] str
      • [ ] datetime
      • [ ] timedelta
      • [ ] date
      • [ ] none
    • Other types' Transformer in core/type_engine.py

How was this patch tested?

[!WARNING] For now, we havn't migrate the test cases to use flyteidl-rust and other type transformation still works in progress, so the CI must fail, don't run.

FlyteRemote only works in Task register / fetch / execute strictly with primitive python types, Workflow and LaunchPlan still works in progress.

Setup process

  1. Install flyteidl-rust pip install -i https://test.pypi.org/simple/ flyteidl-rust Screenshot 2024-06-27 at 7 11 54 PM

  2. Run the tasks which only using string, integer, float as input or output. For example,

    • pyflyte register ./echo_string_task.py and run on web console, or
    • pyflyte run --remote ./echo_string_task.py test_echo_string_task --s "hey"

Screenshots

  • pyflyte register echo_int and echo_float task. Screenshot 2024-06-27 at 7 16 38 PM
  • re-run echo_string task from console. Screenshot 2024-06-27 at 7 15 25 PM
  • pyflyte run --remote
      from flytekit import task
    
      @task()
      def test_square_task(n: float) -> float:
          return n**2
    
    Screenshot 2024-06-27 at 7 24 05 PM Screenshot 2024-06-27 at 7 24 40 PM

Check all the applicable boxes

  • [ ] I updated the documentation accordingly.
  • [ ] All new and existing tests passed.
  • [X] All commits are signed-off.

Related PRs

Docs link

Cross References

  1. https://github.com/pydantic/pydantic-core
  2. https://github.com/pola-rs/pyo3-polars/tree/main/pyo3-polars

austin362667 avatar Jun 27 '24 11:06 austin362667

Thanks, Ketan, for reminding me not to compromise on the earlier serialization approach, and thanks to Kevin and Troy for sharing helpful experiences as they are developing the rust-fs project. I also appreciate others’ insightful discussions.

austin362667 avatar Jun 27 '24 11:06 austin362667

There are several implementation details need to be discussed (in regards to the developer experience):

  1. Enum comparison as integer or as string or as enum itself via different magic function.

    Since we've add __repr__ magic function for rust enum type to show as int string, we can use int(str(literal_type.type)) to compare literal types. Or we can just leverage Rust Prost's as_str_name() function by exposing it to python binding through PyO3. Or we can add compare operator mentioned in https://pyo3.rs/v0.22.0/class/protocols .

  2. The positional arguments of tuple variants in complex enum.

    Some Rust code generated enum variants comes in tuple form, like

    pub struct Literal {
          pub value: ::core::option::Option<literal::Value>, 
    }
    
    /// Nested message and enum types in `Literal`.
    pub mod literal {
        #[::pyo3_macro::with_pyclass]
        #[derive(::pyo3_macro::WithNew)]
        #[allow(clippy::derive_partial_eq_without_eq)]
        #[derive(Clone, PartialEq, ::prost::Oneof)]
        pub enum Value {
            /// A simple value.
            #[prost(message, tag = "1")]
            Scalar(::prost::alloc::boxed::Box<super::Scalar>),
            /// A collection of literals to allow nesting.
            #[prost(message, tag = "2")]
            Collection(super::LiteralCollection),
            /// A map of strings to literals.
            #[prost(message, tag = "3")]
            Map(super::LiteralMap),
        }
    }
    

    So the usage in Python will be like,

    lit = flyteidl.core.Literal(value=flyteidl.literal.Value.Scalar(flyteidl.core.Scalar(flyteidl.scalar.Value.Primitive(flyteidl.core.Primitive(flyteidl.primitive.Value.StringValue("x"))))), hash="", metadata={})
    

    , which has no named arguments. It's not that trivia to use:

     lit.value[0].value[0].value[0]
    

    image

austin362667 avatar Jun 27 '24 11:06 austin362667