Bugfix: Pydantic deserialization for FlyteFile and FlyteDirectory
Fixes #6669
Tracking issue
Closes flyteorg/flyte#6669
Why are the changes needed?
When deserializing Pydantic models containing FlyteFile or FlyteDirectory fields using model_validate(), the deserialized objects were
missing private attributes (_remote_source, _downloader, etc.). This caused an AttributeError when attempting to re-serialize these
objects with model_dump(), breaking the serialize → deserialize → serialize cycle.
This is a critical bug that prevents users from using FlyteFile/FlyteDirectory within Pydantic BaseModel classes in a normal way.
What changes were proposed in this pull request?
Added Pydantic model validators to both FlyteFile and FlyteDirectory classes that ensure private attributes are properly initialized during
deserialization:
-
FlyteFile (
flytekit/types/file/file.py):- Enhanced
deserialize_flyte_filevalidator to check if private attributes exist - If missing, reconstructs the FlyteFile using
dict_to_flyte_file()transformer - If attributes already exist (e.g., when passing already-constructed FlyteFile), returns as-is
- Enhanced
-
FlyteDirectory (
flytekit/types/directory/types.py):- Applied same fix to
deserialize_flyte_dirvalidator - Uses
dict_to_flyte_directory()to properly reconstruct the object
- Applied same fix to
How was this patch tested?
Added two new unit tests in test_pydantic_basemodel_transformer.py:
test_flytefile_pydantic_model_dump_validate_cycle- Verifies FlyteFile can be serialized, deserialized, and re-serialized without errorstest_flytedirectory_pydantic_model_dump_validate_cycle- Same for FlyteDirectory
Check all the applicable boxes
- [ ] I updated the documentation accordingly.
- [x] All new and existing tests passed.
- [x] All commits are signed-off.
Summary by Bito
- This pull request introduces a critical bug in the deserialization of Pydantic models containing FlyteFile and FlyteDirectory fields, which could lead to AttributeErrors during re-serialization.
- Model validators are added to ensure that private attributes are properly initialized during deserialization.
- New unit tests have been added to verify the functionality of these changes, ensuring that the serialize-deserialize-serialize cycle works without errors.
- Overall, this pull request addresses deserialization issues in Pydantic models, introduces critical bugs, and adds unit tests.