flytekit icon indicating copy to clipboard operation
flytekit copied to clipboard

Fixes to dataclass transformer union handling

Open salotz opened this issue 7 months ago • 6 comments

Why are the changes needed?

Motivating use case

I was attempting to use the Python SDK in Python code to run tasks/workflows. When dealing with a case like this:

@dataclass
class FlyteBlobComplex:

    blob: FlyteFile
    param: float

@task()
def smoke_blob_complex_inputs(val: FlyteBlobComplex) -> bool:

    val.blob.download()
    check_file(Path(val.blob.path))
    
    return True

I ran into problems with supplying the proper inputs to FlyteRemote.execute for the FlyteFile attributes.

After some troubleshooting I found a set of bugs surrounding how types and unions are handled in the DataclassTransformer class to be the culprit.

At a high level I should be able to supply inputs like this:

flyte_remote.execute(
    smoke_blob_complex_inputs,
    inputs={
        "val" : {
            "blob" : {"path" : "gs://mybucket/path/to/obj"},
            "param" : 1.0,
        },
    },
)

However this yielded this error:

flytekit.core.type_engine.TypeTransformerFailedError: The original fields are missing the following keys from the dataclass fields: ['metadata']

What changes were proposed in this pull request?

In this PR I fixed two issues related to handling of union types in the DataclassTransformer class that were manifesting in the behavior above.

First, when types of dataclass fields were being retrieved with dataclasses.fields method (on the FlyteFile object, which I suppose is also a dataclass but probably a special case (?)) the f.type was somehow being cast to a string such that you would get 'Optional[dict[str,str]] for the metadata field. When this was checked for if it was an optional type this would fail as the checker doesn't understand strings, only real type objects. (NOTE that this is a type mismatch that could probably be figured out statically).

So replacing dataclasses.fields with typing.get_type_hints properly resolves the types as type objects solves this issue.

Second, further down in the code when dealing with non-Optional Unions values are only checked if they are optional types and if type(value) == expected_type where expected type might be Union[str, Pathlike] like in the case of the path attribute of the FlyteFile.

To address this I added some code to check if the value is any of the types in the union.

These fixes solve my immediate issue but I still get the feeling that there is some bigger picture issue that I'm not seeing leading to these problems. I would appreciate some perspective from the team.

How was this patch tested?

See the use case above.

Check all the applicable boxes

  • [ ] I updated the documentation accordingly.
  • [ ] All new and existing tests passed.
  • [ ] All commits are signed-off.

Summary by Bito

This pull request enhances the DataclassTransformer class by improving the handling of union types. It replaces dataclasses.fields with typing.get_type_hints for better type resolution and refines type checking logic, addressing input errors in Flyte tasks with dataclass inputs.

Unit tests added: False

Estimated effort to review (1-5, lower is better): 2 - The changes are straightforward and primarily focus on type handling, making the review process relatively simple.

salotz avatar May 14 '25 14:05 salotz