[BUG] Deserializing Pydantic models with FlyteFile fields creates FlyteFile instances that can't be serialized again.
Flyte & Flytekit version
This can be reproduced with flytekit==1.16.5.
Describe the bug
FlyteFiles can be used in Pydantic models. When you serialize and deserialize a Pydantic model that uses FlyteFile on its fields, the resulting FlyteFile instances are "broken". Serializing them again (both via Pydantic or Flyte) causes an exception.
This is a minimal example to reproduce the issue:
from pydantic import BaseModel
from flytekit import FlyteFile
class MyModel(BaseModel):
file: FlyteFile
model = MyModel(file=FlyteFile.from_source("gs://does/not/matter.txt"))
deserialized_model = MyModel.model_validate(model.model_dump())
deserialized_model.model_dump() # <- this always causes the exception
This is the exception:
╭─────────────────────────────────────── Traceback (most recent call last) ───────────────────────────────────────╮
│ /home/coder/platform/.venv/lib/python3.12/site-packages/flytekit/types/file/file.py:196 in serialize_flyte_file │
│ │
│ ❱ 196 │ │ lv = FlyteFilePathTransformer().to_literal( │
│ │
│ /home/coder/platform/.venv/lib/python3.12/site-packages/flytekit/core/type_engine.py:318 in to_literal │
│ │
│ ❱ 318 │ │ result = synced(ctx, python_val, python_type, expected) │
│ │
│ /home/coder/platform/.venv/lib/python3.12/site-packages/flytekit/utils/asyn.py:113 in wrapped │
│ │
│ ❱ 113 │ │ │ return self.run_sync(coro_func, *args, **kwargs) │
│ │
│ /home/coder/platform/.venv/lib/python3.12/site-packages/flytekit/utils/asyn.py:106 in run_sync │
│ │
│ ❱ 106 │ │ return self._runner_map[name].run(coro) │
│ │
│ /home/coder/platform/.venv/lib/python3.12/site-packages/flytekit/utils/asyn.py:85 in run │
│ │
│ ❱ 85 │ │ res = fut.result(None) │
│ │
│ /usr/lib/python3.12/concurrent/futures/_base.py:456 in result │
│ │
│ ❱ 456 │ │ │ │ │ return self.__get_result() │
│ │
│ /usr/lib/python3.12/concurrent/futures/_base.py:401 in __get_result │
│ │
│ ❱ 401 │ │ │ │ raise self._exception │
│ │
│ /home/coder/platform/.venv/lib/python3.12/site-packages/flytekit/types/file/file.py:586 in async_to_literal │
│ │
│ ❱ 586 │ │ │ if python_val._remote_source is not None: │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
AttributeError: 'FlyteFile' object has no attribute '_remote_source'
The above exception was the direct cause of the following exception:
╭─────────────────────────────────────── Traceback (most recent call last) ───────────────────────────────────────╮
│ in <module>:11 │
│ │
│ ❱ 11 deserialized_model.model_dump() # <- this always causes the exception │
│ │
│ /home/coder/platform/.venv/lib/python3.12/site-packages/pydantic/main.py:464 in model_dump │
│ │
│ ❱ 464 │ │ return self.__pydantic_serializer__.to_python( │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
PydanticSerializationError: Error calling function `serialize_flyte_file`: AttributeError: 'FlyteFile' object has
no attribute '_remote_source'
Expected behavior
The expected bahavior is that a serialized and then deserialized FlyteFile works for all practical purpose as the original. This means i can serialize, deserialize, and serialize it again.
Additional context to reproduce
No response
Screenshots
No response
Are you sure this issue hasn't been raised already?
- [x] Yes
Have you read the Code of Conduct?
- [x] Yes
This is a known issue: when a FlyteFile is used as a field in a Pydantic model, serializing and then deserializing the model with Pydantic (e.g., using model_dump/model_validate) results in a FlyteFile instance that lacks internal attributes like _remote_source. This causes serialization to fail with AttributeError on subsequent dumps. The root cause is that Pydantic's deserialization does not reconstruct FlyteFile with all required internal state, as FlyteFile expects to be initialized with certain arguments and internal setup that are not preserved by a plain dict round-trip. This is confirmed by test cases and user reports in the Flyte repository, and there is no official fix yet. The recommended workaround is to avoid using FlyteFile as a direct field in Pydantic models for serialization/deserialization, or to use dataclasses instead, which are handled correctly by Flyte's type engine and serialization logic. See the discussion and test cases in test_pydantic_basemodel_in_dataclass.py and related issues.
Would you like a step-by-step explanation of why this happens and possible workarounds?
Sources:
This is an AI-generated response and your feedback is appreciated! Please leave a 👍 if this is helpful and 👎 if it is not.
Mention @runllm to ask follow-up questions. See runllm.com for more!
We have completely overhauled the pydantic system for v 2.0 so we think it's better to get that implementation and backport it if needed else use v2.0 directly - backend will be available soon. But sdk is available today