flytekit icon indicating copy to clipboard operation
flytekit copied to clipboard

Bugfix: Pydantic deserialization for FlyteFile and FlyteDirectory

Open gverkes opened this issue 3 months ago • 3 comments

Fixes #6669

Tracking issue

Closes flyteorg/flyte#6669

Why are the changes needed?

When deserializing Pydantic models containing FlyteFile or FlyteDirectory fields using model_validate(), the deserialized objects were missing private attributes (_remote_source, _downloader, etc.). This caused an AttributeError when attempting to re-serialize these objects with model_dump(), breaking the serialize → deserialize → serialize cycle.

This is a critical bug that prevents users from using FlyteFile/FlyteDirectory within Pydantic BaseModel classes in a normal way.

What changes were proposed in this pull request?

Added Pydantic model validators to both FlyteFile and FlyteDirectory classes that ensure private attributes are properly initialized during deserialization:

  1. FlyteFile (flytekit/types/file/file.py):

    • Enhanced deserialize_flyte_file validator to check if private attributes exist
    • If missing, reconstructs the FlyteFile using dict_to_flyte_file() transformer
    • If attributes already exist (e.g., when passing already-constructed FlyteFile), returns as-is
  2. FlyteDirectory (flytekit/types/directory/types.py):

    • Applied same fix to deserialize_flyte_dir validator
    • Uses dict_to_flyte_directory() to properly reconstruct the object

How was this patch tested?

Added two new unit tests in test_pydantic_basemodel_transformer.py:

  1. test_flytefile_pydantic_model_dump_validate_cycle - Verifies FlyteFile can be serialized, deserialized, and re-serialized without errors
  2. test_flytedirectory_pydantic_model_dump_validate_cycle - Same for FlyteDirectory

Check all the applicable boxes

  • [ ] I updated the documentation accordingly.
  • [x] All new and existing tests passed.
  • [x] All commits are signed-off.

Summary by Bito

  • This pull request introduces a critical bug in the deserialization of Pydantic models containing FlyteFile and FlyteDirectory fields, which could lead to AttributeErrors during re-serialization.
  • Model validators are added to ensure that private attributes are properly initialized during deserialization.
  • New unit tests have been added to verify the functionality of these changes, ensuring that the serialize-deserialize-serialize cycle works without errors.
  • Overall, this pull request addresses deserialization issues in Pydantic models, introduces critical bugs, and adds unit tests.

gverkes avatar Oct 13 '25 08:10 gverkes