Support extremely flexible list datatype declarations
Is your feature request related to a problem?
Writing datatypes in Daft today happens in a few places:
- UDF return types (
return_dtype=...) - .apply return types (
.apply(..., return_dtype=...)) - Casting (
.cast(...)) - Type hinting (
.read_parquet(schema=...))
(I might have missed a few places)
However, writing exact Daft types is quite verbose:
from daft import DataType
@daft.udf(return_dtype=DataType.float64())
def f():
...
To fix this, we allowed some mapping of Python types to Daft types:
from daft import DataType
@daft.udf(return_dtype=DataType.float64())
def f():
...
This also works for struct types, using Python dicts:
from daft import DataType
@daft.udf(return_dtype={"foo": DataType.float64()})
def f():
...
However, lists don't work! Thus building a highly complex type such as a list-of-list-of-struct-of-list is highly verbose:
from daft import DataType
@daft.udf(return_dtype=DataType.list(DataType.list(DataType.struct({"foo": DataType.list(float)}))))
def f():
...
Describe the solution you'd like
Here is a proposal, which looks quite Pythonic, but I'm not aware of any other library that does this which is maybe a bit concerning.
@daft.udf(return_dtype=[[{"foo": [float, ...]}, ...], ...])
In Python, there is the Ellipsis singleton. The above is completely valid Python, and expresses the same types as the highly verbose datatype variant, while remaining relatively readable. Look at this, it's beautiful!
@daft.udf(return_dtype={
"bboxes": [[float, 4], ...], # fixed size syntax
"objects": [str, ...],
"image": [[[DataType.int8(), ...], ...], 3], # mix and match for specific datatypes
"metadata": dict[str, int], # map type
})
Describe alternatives you've considered
No response
Additional Context
No response
Would you like to implement a fix?
No
"metadata": dict[str, int], # map type
@jaychia how would you expect one to distinguish between a struct and a map type?
I think struct and map types are fairly different
dict[str, int] # map type
{"foo": int, "bar": str} # struct type
Maps can have any number of keys/values, but they must strictly adhere to the specified type. Structs have a strict set of keys only.