Daft icon indicating copy to clipboard operation
Daft copied to clipboard

Support extremely flexible list datatype declarations

Open jaychia opened this issue 5 months ago • 2 comments

Is your feature request related to a problem?

Writing datatypes in Daft today happens in a few places:

  1. UDF return types (return_dtype=...)
  2. .apply return types (.apply(..., return_dtype=...))
  3. Casting (.cast(...))
  4. Type hinting (.read_parquet(schema=...))

(I might have missed a few places)

However, writing exact Daft types is quite verbose:

from daft import DataType

@daft.udf(return_dtype=DataType.float64())
def f():
   ...

To fix this, we allowed some mapping of Python types to Daft types:

from daft import DataType

@daft.udf(return_dtype=DataType.float64())
def f():
   ...

This also works for struct types, using Python dicts:

from daft import DataType

@daft.udf(return_dtype={"foo": DataType.float64()})
def f():
   ...

However, lists don't work! Thus building a highly complex type such as a list-of-list-of-struct-of-list is highly verbose:

from daft import DataType

@daft.udf(return_dtype=DataType.list(DataType.list(DataType.struct({"foo": DataType.list(float)}))))
def f():
   ...

Describe the solution you'd like

Here is a proposal, which looks quite Pythonic, but I'm not aware of any other library that does this which is maybe a bit concerning.

@daft.udf(return_dtype=[[{"foo": [float, ...]}, ...], ...])

In Python, there is the Ellipsis singleton. The above is completely valid Python, and expresses the same types as the highly verbose datatype variant, while remaining relatively readable. Look at this, it's beautiful!

@daft.udf(return_dtype={
    "bboxes": [[float, 4], ...],    # fixed size syntax
    "objects": [str, ...],
    "image": [[[DataType.int8(), ...], ...], 3],  # mix and match for specific datatypes
    "metadata": dict[str, int],  # map type
})

Describe alternatives you've considered

No response

Additional Context

No response

Would you like to implement a fix?

No

jaychia avatar Jul 23 '25 04:07 jaychia

"metadata": dict[str, int], # map type

@jaychia how would you expect one to distinguish between a struct and a map type?

universalmind303 avatar Jul 23 '25 17:07 universalmind303

I think struct and map types are fairly different

dict[str, int]  # map type
{"foo": int, "bar": str}  # struct type

Maps can have any number of keys/values, but they must strictly adhere to the specified type. Structs have a strict set of keys only.

jaychia avatar Jul 24 '25 18:07 jaychia