datachain icon indicating copy to clipboard operation
datachain copied to clipboard

Support Optional[DataModel] columns / fields / outputs

Open shcheklein opened this issue 8 months ago • 1 comments

More context / discussion is here

from typing import Optional
from dotenv import load_dotenv
from datachain import read_storage, C, File


load_dotenv(".env.test")

def process_events(file: File) -> Optional[File]:
    return file


(
    read_storage("s3://bucket/v1")
    .settings(cache=True)
    .limit(1)
    .map(events_file=process_events)
    .show()
)

fails in UDF prep logic.

It's more than just UDFs (see details below). A proper solution is to have an additional column per Optional objects. Which is complicated, but not sure we can do it other way.

shcheklein avatar Apr 23 '25 01:04 shcheklein

Probably we also don't effectively support Optional[DataModel] fields in the DataModels. For example it means we break on ClaudeModel from the anthropic package.

class Usage(BaseModel):
    cache_creation_input_tokens: Optional[int] = None
    """The number of input tokens used to create the cache entry."""

    cache_read_input_tokens: Optional[int] = None
    """The number of input tokens read from the cache."""

    input_tokens: int
    """The number of input tokens which were used."""

    output_tokens: int
    """The number of output tokens which were used."""

    server_tool_use: Optional[ServerToolUsage] = None
    """The number of server tool requests."""

    service_tier: Optional[Literal["standard", "priority", "batch"]] = None
    """If the request used the priority, standard, or batch tier."""

that's how it is defined ^^

File ~/Projects/datachain-examples/.venv/lib/python3.12/site-packages/datachain/lib/dc/datachain.py:657, in DataChain.map(self, func, params, output, **signal_map)
    652 if (prefetch := self._settings.prefetch) is not None:
    653     udf_obj.prefetch = prefetch
    655 return self._evolve(
...
     79     if _is_json_inside_union(orig, args):
     80         return JSON
---> 82 raise TypeError(f"Cannot recognize type {typ}")

TypeError: Cannot recognize type <class 'anthropic.types.server_tool_usage.ServerToolUsage'>

shcheklein avatar Jun 15 '25 18:06 shcheklein