datachain
datachain copied to clipboard
Support Optional[DataModel] columns / fields / outputs
More context / discussion is here
from typing import Optional
from dotenv import load_dotenv
from datachain import read_storage, C, File
load_dotenv(".env.test")
def process_events(file: File) -> Optional[File]:
return file
(
read_storage("s3://bucket/v1")
.settings(cache=True)
.limit(1)
.map(events_file=process_events)
.show()
)
fails in UDF prep logic.
It's more than just UDFs (see details below). A proper solution is to have an additional column per Optional objects. Which is complicated, but not sure we can do it other way.
Probably we also don't effectively support Optional[DataModel] fields in the DataModels. For example it means we break on ClaudeModel from the anthropic package.
class Usage(BaseModel):
cache_creation_input_tokens: Optional[int] = None
"""The number of input tokens used to create the cache entry."""
cache_read_input_tokens: Optional[int] = None
"""The number of input tokens read from the cache."""
input_tokens: int
"""The number of input tokens which were used."""
output_tokens: int
"""The number of output tokens which were used."""
server_tool_use: Optional[ServerToolUsage] = None
"""The number of server tool requests."""
service_tier: Optional[Literal["standard", "priority", "batch"]] = None
"""If the request used the priority, standard, or batch tier."""
that's how it is defined ^^
File ~/Projects/datachain-examples/.venv/lib/python3.12/site-packages/datachain/lib/dc/datachain.py:657, in DataChain.map(self, func, params, output, **signal_map)
652 if (prefetch := self._settings.prefetch) is not None:
653 udf_obj.prefetch = prefetch
655 return self._evolve(
...
79 if _is_json_inside_union(orig, args):
80 return JSON
---> 82 raise TypeError(f"Cannot recognize type {typ}")
TypeError: Cannot recognize type <class 'anthropic.types.server_tool_usage.ServerToolUsage'>